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ABSTRACT 

A long standing question in cosmology is whether gravitational lensing changes the 
distance-redshift relation D{z) or the mean flux density of sources. Interest in this has 
been rekindled by recent studies in non-linear relativistic perturbation theory that 
find biases in both the area of a surface of constant redshift and in the mean dis¬ 
tance to this surface, with a fractional bias in both cases on the order of the mean 
squared convergence (k^). Any such area bias could alter CMB cosmology, and the 
corresponding bias in mean flux density could affect supernova cosmology. Here we 
show that, in an ensemble averaged sense, the perturbation to the area of a surface 
of constant redshift is in reality much smaller, being on the order of the cumulative 
bending angle squared, or roughly a part-in-a-million effect. This validates the argu¬ 
ments of Weinberg (1976) that the mean magnification ft of sources is unity and of 
Kibble & Lieu (2005) that the mean direction-averaged inverse magnification is unity. 
It also validates the conventional treatment of lensing in analysis of CMB anisotropies. 
But the existence of a scatter in magnification will cause any non-linear function of 
these conserved quantities to be statistically biased. The distance D, for example, is 
proportional to so lensing will bias (D) even if (/x) = 1. The fractional bias in 

such quantities is generally of order (k^), which is orders of magnitude larger than 
the area perturbation. Claims for large bias in area or flux density of sources appear 
to have resulted from misinterpretation of such effects: they do not represent a new 
non-Newtonian effect, nor do they invalidate standard cosmological analyses. 

Key words: Cosmology: theory, observations, distance scale, large-scale structure, 
cosmic background radiation 


1 INTRODUCTION 

In homogeneous and isotropic cosmologies the ratio between 
the proper size of a source and the angle subtended at the 
observer - the angular diameter distance D - is solely a 
function of redshift. In an inhomogeneous universe, gravita¬ 
tional lensing by intervening metric fluctuations can cause 
magnification of the angular size - with associated change of 
flux density, since surface brightness is unaffected by lensing. 
Thus the apparent distance to objects at a given a becomes 
in effect a randomly fluctuating quantity. Equivalently, the 
flux density measured on a sphere surrounding an object at 
redshift 2 is a random function of position on the sphere. 
The question we shall address here is whether distances or 
flux densities are perturbed in the mean. 

This subject has a long history with pioneering studies 
by Zel’dovich (1964) and Feynman (in a colloquium at the 
California Institute of Technology in 1964; see Gunn 1967b) 
with detailed calculations using point masses performed by 
Bertotti (1966), using the ‘optical scalar’ formalism of Sachs 


(1961), and by Gunn (1967a,b). Swiss-cheese models (Ein¬ 
stein & Straus, 1945) were used by Kantowski (1969) and 
later by Dyer & Roeder (1972, 1974) who generalised Kan- 
towski’s results to include a cosmological constant. These 
works suggested that there is a non-vanishing perturbation 
to the mean flux densities of distant sources caused by inter¬ 
vening structures, at least for sources that are viewed along 
lines of sight that avoid mass concentrations. 

1.1 Flux conservation 

Weinberg (1976), however, argued via conservation of pho¬ 
tons that for transparent lenses there could be no mean flux 
density amplification and that the uniform universe formula 
for remains valid. The apparent distance D of a source 
at a fixed 2 is, by definition, proportional to l/\/f2 where 
Ul is the solid angle a ‘standard source’ subtends (or would 
if resolved), while conservation of surface brightness means 
that the flux density S is proportional to fl. In terms of the 
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magnification p = 5/So, where S is the actual flux density 
and So is the flux density a standard sonrce would have at 
the same 2 if the structure were smoothed out, Weinberg 
says that {P}a = 1, where the averaging is over sources, or 
equivalently over area on the source sphere (hence the sub¬ 
script A). Alternatively, one can say that {DqID^)a = 1, 
where Do is the angular diameter distance in the smoothed 
out background. This result, however, rests on the implicit 
assumption that the area of the constant -2 surface is unaf¬ 
fected by leasing. 

This invariance of the mean flux density, however, ap¬ 
pears to contradict a well-known theorem of gravitational 
leasing, stating that at least one image is always magnified 
(Schneider 1984; Ehlers & Schneider 1986; Seitz & Schneider 
1992). Taking a somewhat different approach, Seitz, Schnei¬ 
der & Ehlers (1994) have used the optical scalars formalism 
of Sachs (1961) to show that the square root of the proper 
area of a narrow bundle of rays D = \/A obeys the ‘focusing 
equation’: 

D/D = -{R + T.^). (1) 

Here D is the second derivative of D with respect to affine 
distance along the bundle; R = Rapk°‘k^/2 is the local Ricci 
focusing from matter in the beam, which for non-relativistic 
velocities is just proportional to the matter density; and 
is the squared rate of shear from the integrated effect 
of up-beam Weyl focusing - i.e. the tidal field of matter 
outside the beam. The resulting focusing theorem is that the 
RHS of 0 is non-positive, so that beams are always focused 
to smaller sizes, at least as compared to empty space-time, 
where beams obey D — 0. (see Schneider, Ehlers & Falco 
1992 and Narlikar 2010 for further details and discussion). 

In the cosmological context Seitz, Schneider & Ehlers 
(1994) therefore state that “a light beam cannot be less fo¬ 
cused than a reference beam that is unaffected by matter in¬ 
homogeneities” , at least up until caustic formation and “no 
source can appear fainter (...) than in the case that there are 
no matter inhomogeneities close to the line-of-sight to the 
source”. But it would be incorrect to conclude that inhomo¬ 
geneities always cause magnification: this analysis actually 
compares the flux density of sources in a universe containing 
a uniform density component plus localised positive density 
lenses with sources in a universe containing only the uniform 
component. This is not quite the same as the real question 
of interest, which is the mean degree of focusing caused by 
perturbations about the mean density - i.e. lenses whose 
density can be negative as well as positive. 

In a spatially flat FRW model, bundles of rays em¬ 
anating from a source or observer travel in straight lines 
at a constant speed in conformal coordinates, so also obey 
D = 0. For general weak-field perturbations to such a model, 
appendix proves an analogue of Q where the RHS is 
— {5R + Jp). For weakly perturbed bundles with D close to 
Do, the unperturbed distance to redshift 2 , we can average 
this equation, assuming (SR) vanishes and setting D = Do 
in the denominator, to obtain the linearised averaged focus¬ 
ing theorem 

(D)/Do = -(E") < 0. (2) 

This implies that (D) < Do so objects viewed through inho¬ 
mogeneity have distances that are systematically decreased 


even when we allow correctly for the fact that the mean 
mass of lenses is zero. 

The transport equation for the rate of shear E (see ap¬ 
pendix 0 shows that, in the perturbative regime at least, 
the resulting mean change in the distance from this cumula¬ 
tive effect of tidal shearing of beams by up-beam structure 
is, at leading order, {AD)/Do ~ (k^), where k is the usual 
first order leasing convergence and AD = D — Do. The con¬ 
vergence for galaxies at 2 ~ 1 is on the order of 1% at de¬ 
gree scales, rising to a few percent for the cosmic microwave 
background (CMB) at 2 ~ 1000, so the mean squared value 
is (k^) ~ 10~^ (e.g. Seljak 1996), which is non-negligible. 
Furthermore, (k^) is a strongly decreasing function of aver¬ 
aging scale, so there is potentially a large effect for compact 
sources such as supernovae at high redshift. 

While interesting and suggestive, one should not nec¬ 
essarily conclude that ([^ invalidates Weinberg’s argument 
that {Do/D^)a = 1- First, the focusing theorem is concerned 
with (D/Do), which is not the same thing, and second the 
focusing equation provides the apparent distance to the far 
end of a ray propagated along some chosen direction from 
the observer. Averaging this, as we shall discuss in more 
detail presently, is not the same as averaging over sources. 

1.2 Lensing and the CMB 

The subject has received much further attention over the 
years, though with varied results, and the scope has ex¬ 
panded to incorporate lensing of the CMB. 

A significant general development came from Kibble & 
Lieu (2005), who emphasised the important distinction be¬ 
tween averaging over sources - which is appropriate for SNla 
cosmology - and averaging over directions on the observer’s 
sky - which is more appropriate for CMB studies. They went 
on to show that, averaged over the sky with equal weight per 
unit solid angle H, which we will denote by ■ ■}n it is the 
inverse magnification that is conserved: = 1, at least 

to the extent that multiple lensing is unimportant. But, as 
with Weinberg’s argument. Kibble & Lieu also assume that 
the area of the constant -2 surface is unperturbed. 

Despite the conservation arguments, many lensing anal¬ 
yses have continued to claim large effects in the mean. Fre¬ 
quently, such calculations make use of Swiss-cheese mod¬ 
els. Kantowski, Vaughan & Branch (1995) and Kantowski 
(1998), for example, claim to confirm Kantowski’s earlier 
conclusions in his 1969 paper and show there should be large 
effects for SNla cosmology. Ellis, Bassett & Dunsby (1998) 
claim that Weinberg’s assumption of invariance of area may 
be strongly violated by strong lensing from small-scale struc¬ 
ture if one is considering observations of supernovae. Clifton 
& Zuntz (2009) find ~ few percent bias in source magni¬ 
tudes using Swiss-cheese models. Bolejko (2011a), also us¬ 
ing Swiss-cheese models, finds that the distance to the CMB 
last-scattering surface is strongly affected by structure, with 
significant impact on cosmological parameter estimation. 
Similar results are presented in Bolejko (2011b) and Bolejko 
& Ferriera (2012). Bolejko (2011a) provides a very useful and 
extensive review of other studies, some of which (e.g. Marra 
et al. 2007) find large effects; some which find effects at the 
level of a few percent (which would still be significant if cor¬ 
rect); while others claim that the effect is very small. An 
important example of the latter is Metcalf & Silk (1997); 
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they integrated the geodesic deviation equation, claiming 
that the mean magnification is only a part-in-a-million ef¬ 
fect. 

Clarkson et al. (2012) provide an extensive review of 
source amplification statistics, focusing mostly on SNla ob¬ 
servations but also touching on the implications for the 
CMB. They claim that the mean magnification of a source 
is 

(/r) ~ 1 -I- (3ac^ -I- 7 ^) -I- ... (3) 

where 7 is the usual first order image shear. This is in conflict 
with Weinberg’s result, though it is qualitatively in line with 
the expectation for (AD) /Do from averaging the focusing 
equation, both in the sign and in the order of magnitude of 
the effect, but would indicate potentially serious problems 
for SNla cosmology if correct. This group has carried out 
a systematic analysis of the distance perturbation in 2 nd 
order relativistic perturbation theory (Umeh et al. 2014a,b). 
They calculate both the perturbation to the redshift and the 
distance as a function of affine parameter (using the geodesic 
and optical scalar equations respectively) and then solve the 
resulting pair of parametric equations. A similar calculation 
has been carried out by Marozzi (2014). 

Most recently, Clarkson et al. (2014; hereafter 
CUMD14), find that there is a perturbation to the relation 
between distance and redshift that, in our notation, is 

{D/Do) = l + l{^}. (4) 

They present several arguments to support this, and say 
that “It implies that the total area of a sphere of constant 
redshift will be larger than in the background”. They also 
compute the perturbation to the proper area of a surface 
of constant z (the integral of over the observer’s sky) 
using the optical scalars transport equation and find this 
to be the square of the integral along the ray of the first 
order perturbation AO to the rate of expansion 0 = A/2 A 
of the ray. Here A is the beam area expressed in conformal 
background - i.e. ‘co-moving’ - coordinates and dot denotes 
the derivative with respect to conformal distance; 0 is not to 
be confused with an angle. To zeroth order, and for a spa¬ 
tially flat background, as we shall assume, 9 is just the in¬ 
verse of the conformal distance. But at first order 9 includes 
the additional rate of change of the beam area caused by 
inhomogeneity. Expressed in terms of the usual first order 
convergence k their result for the area is 

{DyDl)n = l + MK). (5) 

This is also in direct conflict with Weinberg. Of this 
CUMD14 say “This is a purely relativistic effect with no 
Newtonian counterpart - and it is the first quantitative pre¬ 
diction for a significant change in the background cosmology 
when averaging over structure” (citing the review of dynam¬ 
ical backreaction by Clarkson et al. 2011). They discuss how 
this may be thought of as arising because of ‘crumpling’ of 
the surface of constant redshift which enhances its area. 

CUMD14 applied their results to compute the mean 
perturbation to the distance of the cosmic photosphere in 
terms of the matter density power spectrum; a significant 
advance over calculations that use idealised spherical Swiss- 
cheese models. They found the strength of the effect in con¬ 
ventional models to be at the ~ 1% level. This, they argued, 


might significantly affect CMB cosmological parameters - 
in particular, resolving the tension between Hq as inferred 
from the CMB (Planck collaboration 2013) and via direct 
distance methods (Riess et al. 2011; although see Efstathiou 
2014). 

A puzzling feature of the calculation is the sign of the 
effects ([^[^: both distance and area are increased by struc¬ 
ture. The first might seem to be opposite to the qualitative 
expectation from the averaged focusing equation. The lat¬ 
ter seems to be at odds with (|^; if the area of a surface of 
constant z around a source is increased then, following Wein¬ 
berg, one would think that conservation of photons would 
imply that the mean flux density seen by observers on that 
surface should be decreased. 

Another surprising feature is that much of the effect 
arises from quite small scale structure. The relevant infor¬ 
mation in the CMB is encoded in the angular frequency I 
of the ‘acoustic peaks’ in the power spectrum of the tem¬ 
perature fluctuations. These arise from perturbations of co¬ 
moving scale of order 100 Mpc. As mentioned, the mean 
square convergence at the photosphere on this scale is only 
~ 10“® so it is hard to see how a ~ 1% effect arises. But the 
mean squared convergence is a strongly decreasing function 
of angular scale, scaling roughly inversely with angle, and 
CUMD14 emphasise that their calculation obtains a large 
contribution from lensing by structures down to ~ 10 kpc 
scale. Again this is hard to understand: as argued by Ellis, 
Bassett & Dunsby (1998), lensing by small scale structure 
should not affect the angular size of extended objects such 
as the acoustic peak scale features. However, no such ob¬ 
jection exists with SNla cosmology, where any lensing bi¬ 
ases could indeed reflect the high small-scale variance in k. 
Thus the CUMD14 results can potentially induce a profound 
change in the inferences about the cosmological model that 
are normally drawn from high-z SNla (e.g. Riess et al. 1998; 
Perlmutter et al. 1999). 

1.3 Overview of the present paper 

In the work presented here, we dispute the above claims for 
significant flux amplification of sources, or equivalently sig¬ 
nificant violation of conservation of area, and we attempt 
to clarify the situation and explain the apparently discor¬ 
dant results that can be found in the literature. We also 
show that, despite its name, the focusing theorem does not 
indicate any tendency for inhomogeneities to cause magni¬ 
fication on average. 

In the first part of the paper we show how, under the 
conventional assumption that the total area of a surface of 
constant 2 is unaffected by lensing, quantities such as the 
mean distance-redshift relation are biased by lensing. If the 
flux density S is unbiased, then so is AnS/L — \/D^\ thus 
{1/D'^)a = A/Do, when averaged over standard sources. But 
the magnification is a fluctuating quantity, so there is a 
dispersion in values of 1/D^ for different lines of sight or 
different source regions. These therefore provide what are, 
in effect, noisy estimates of I/Hq- K one takes a non-linear 
function of A/D^, such as D, and then averages, the noise ef¬ 
fectively gets rectified and inevitably {D) A Do even though 
{1/D^) itself is unbiased. 

We show that the claims for non-zero mean source am¬ 
plification or surface area bias in the calculations described 
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Figure 1. In a hypothetical universe with inhomogeneity in some 
finite region of space, consider the mean fractional change to the 
area of a surface of constant redshift, or cosmic time, which, in 
the absence of structure, lies at comoving distance Aq (note that 
our notation here differs from that of Weinberg 1976, who used A 
to denote affine parameter). We find that the area dA' is biased, 
but to an extremely small extent, as a result of two competing 
effects: (1) the radius reached by light rays is reduced because 
they are not straight; (2) the surface is ‘wrinkled’ owing to time 
delays induced by the density fluctuations. Regarding the first 
effect, a single lensing structure would cause a deflection ©i ~ ^ 
where (p is the metric perturbation (or the dimensionless New¬ 
tonian potential) and the corresponding fractional decrease in 
distance reached would be Ar/r ~ ©^. The effect of N ~ \/L 
of these structures with metric fluctuations of random sign - as¬ 
sumed to have size L and lying along a path length A - would be 
N times larger. So (Ar}/r ~ {©^) (pAXjL where {©^) 7V©^ is 

the cumulative mean square deflection. As for the second effect, 
one can draw an analogy with the surface of a swimming pool 
perturbed by random waves of small amplitude. These cause a 
fractional increase in the area of the surface that is on the order 
of the mean square tilt of the surface. Here the surface is per¬ 
pendicular to the light rays, so we expect that the area increase 
is also, to order of magnitude, (AA}/A ~ (©^}- Both effects are 
caused predominantly by structures on scales of tens of Mpc, and 
these give only a part-in-a-million effect, counter to much larger 
recent claims from relativistic perturbation theory. This is the 
main new result of this paper, discussed at length in E 

above arise partly from failing to make this distinction be¬ 
tween distance bias and flux-density bias, but mostly from 
ignoring the distinction between averaging over sources and 
averaging over direction. We find that the RHS of © is the 
direction averaged (rather than source averaged) amplifica¬ 
tion and is the bias in the source-averaged distance, while 
the direction averaged distance, which is more relevant for 
CMB observations, is 

{D/Do)n = l-l{K^). ( 6 ) 

The RHS of ® is the source averaged inverse amplification 
/Dq)a rather than the average over the observer’s 
sky (it also happens to be the direction average of fi) and so 
it does not reflect any increase in the area of the photosphere 
or surface of constant z. 

The rest of the paper consists of a calculation of the 
perturbation to the area of a surface of constant redshift. 


This is the net result of the competing effects of wiggling of 
rays, which reduces the radius they reach, and the wrinkling 
of the surface via time delays, which increases its area. We 
show, using both the the geodesic equation (appendix 
and via the much more arduous route of the optical scalars 
formalism (appendix]^, that the area bias is on the order 
of the mean squared cumulative deflection angle, not the 
much larger mean squared convergence. This means that, 
at least as far as sub-horizon scale structure is concerned, 
Weinberg’s flux-conservation argument is actually good to 
about one part in a million, and no radical changes to SNla 
cosmological inferences need to be made. The calculation 
is somewhat involved, but a (only slightly over-simplified) 
order-of-magnitude argument for why this should be the case 
is given in the caption to Figure 

The outline of the paper is as follows: In we com¬ 
pute the statistical bias in quantities such as the apparent 
distance under the assumption that area is unbiased by lens¬ 
ing. In §2.1 1 we consider biases that arise when averaging over 
sources. In §2.2| turning to the CMB, we consider the statis¬ 
tics of quantities that are averaged over direction, rather 
than averaging over sources. In §2.2.1| we consider the argu¬ 
ment of Kibble & Lieu (2005) that the direction averaged 
inverse magnification is conserved, and in (2.3 we recall the 
calculations of Metcalf & Silk (1997). In (2.4 we calculate 
the mean inverse magnification caused by a thin screen of 
lenses and find this is zero, consistent with Kibble & Lieu 
and we discuss the generalisation of this to a shell containing 
deflectors of a finite size. We then give the statistical bias in 
the direction averaged distance and magnification and show 
that the latter nicely accounts for (§. 

In (|^we expand on the simple-minded argument in the 
caption to Figure]^ and attempt to give a heuristic expla¬ 
nation of the results of the detailed calculation presented 
in appendix We note that the argument above is over¬ 
simplified in one respect, but we show that this does not 
significantly alter the basic conclusion that the area bias 
is essentially zero. In §3.3| we identify the scale of struc¬ 
tures that dominate the ensemble effect on the area. In §3.4| 
we consider fluctuations about the ensemble average area 
increase that we have calculated. We argue that for sub¬ 
horizon scale density perturbations alone these are small, so 
the area of one observer’s sky will be close to the ensemble 
mean, and the mean fractional change to flux densities will 
be close to —{AA)/Ao. But for horizon scale perturbations 
there is a first order change to the area that is typically on 
the order of the metric perturbation for these modes and is 
actually larger in mean modulus than the ensemble mean 
from sub-horizon scale structure. In ( ]4.3| we discuss how dif¬ 
ferent ways of analysing CMB data could, in principle, result 
in biased results, but argue that the conventional analysis 
method (Hu 2000; Challinor & Lewis 2005) avoids this. 

Appendix contains the detailed calculation of the 
mean perturbation to the photosphere area at second or¬ 
der in the metric perturbations, arising from gravitational 
time delays and the associated light path deflection (though 
the result is obtained entirely as the average of the prod¬ 
ucts of first order quantities). There, in ^Al[ we describe 
why the weak-field model for metric fluctuations provides 
an adequate description and we recall the analogy between 
light propagation in a weakly perturbed FRW cosmology 
and light propagating in a medium with spatially varying. 
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but locally isotropic, refractive index (‘lumpy glass’). In [ A2 
we discuss the appropriate boundary conditions for the end 
of the rays, and the distinction between surfaces of constant 
2 and the cosmic photosphere (the latter being a surface of 
constant optical path in the lumpy glass analogy). 

The resulting ensemble mean for the fractional area per¬ 
turbation {AA}/Ao emerges as a weighted integral along the 
line of sight of 


U 

J =-8 j dy^'^{y)/y = 2nJkAl{k)dlnk, 


( 7 ) 


where is the derivative with respect to conformal (or ‘co¬ 
moving’) background coordinates of the two-point spatial 
auto-correlation function of the dimensionless Newtonian 
gravitational potential fluctuations (divided by c^); A| is 
the dimensionless power spectrum of (j) (variance per Infc). 
Physically, J is the rate of change with respect to path length 
of the ensemble mean square angular deflection of a ray. It is 
similar to the ‘ J 3 ’ integral (Peebles 1981) and is dominated 
by large scale density fluctuations around the peak of the 
matter power spectrum. This demonstrates rigorously that 
the effect is on the order of the mean squared cumulative 
deflection angle, and is therefore many orders of magnitude 
smaller than the statistical biases such as in (§, @, (§ and 

f§. 

If the potential fluctuations are non-evolving then 
{AA)/Ao = (2/3)AoJ where Ao is the conformal distance to 
redshift 2 (in units where conformal distance has dimensions 
of length). The value of J in the ‘concordance’ cosmologi¬ 
cal model is J ~ 9.9 x 10~^^/i/Mpc (this is the asymptotic 
value at high redshift when the potential is non-evolving; 
at low 2 the potential decreases with time and J falls to 
about 60% of this value at 2 = 0). The overall path length 
is Ao — 9800/i“^Mpc so the net perturbation to the area of 
the photosphere is (AA)/Ao ~ 6 x 10“^. 

We argue in !|^that, while the calculation is performed 
using perturbation theory, this is valid even if non-linear 
lensing by very small scale structure causes the shear and 
amplification of most lines of sight to high redshift to be 
significant. 

Several other technical calculations are consigned to ap¬ 
pendices. In appendix]^ we calculate the first-order beam 
expansion rate that is used in appendix In appendix 
we show how the result of Metcalf & Silk’s calculation of 
the mean magnification, while qualitatively very similar to 
ours, differs at a detailed level, particularly in regard to the 
effect from nearby lenses. In appendix we show how our 
results can be obtained from the optical scalar formalism. In 
appendix]^ we show how the non-vanishing inverse magni¬ 
fication averaged over sources can be understood as arising 
because light paths to sources tend to avoid over-dense re¬ 
gions. 

Although some of the detail in the appendices is admit¬ 
tedly excessive in the face of what turns out to be a very 
small correction, there is value in collecting this material 
together. Flrrx conservation will probably continue to be of 
great importance in gravitational lensing, and it is impor¬ 
tant to understand the issue in depth. We hope the present 
paper is a useful contribution to this process. 


2 STATISTICAL BIASES 

In this section we show how quantities such as distance 
can be statistically biased. We consider both averages over 
sources and over directions, presenting the conservation ar¬ 
guments of Weinberg (1976) and Kibble & Lieu (2005) and 
showing how powers of the distance may or may not be bi¬ 
ased. We illustrate these general points with the specific case 
of a thin deflecting screen. 


2.1 Source averaged properties 

2.1.1 Photon conservation 

Weinberg (1976) argued that transparent lenses cannot 
change the mean flux density of sources on the grounds of 
conservation of the flux of photons. The idea is that if a 
monochromatic source emits N photons per period of the 
emitted radiation then there must also be N photons per 
(redshifted) period passing through any surface of constant 
redshift. Additionally, static lenses do not affect the redshift 
of sources. So, while individual sources may be magnified or 
de-magnified, and some may be multiply imaged, the aver¬ 
age fraction of photons from a source at redshift 2 that we 
detect is the ratio of our telescope aperture to the proper 
area of the sphere around each source on which the redshift 
has value 2 . Averaged over the observers that uniformly pop¬ 
ulate the sphere around a particular source, the flux density 
is thus unbiased. 

To obtain the quantity of more interest, which is the 
mean flux density of sources seen by one observer, one can 
argue that the average over the entire ensemble of pairs of 
sources and the observers who see them to have redshift 2 
the flrrx density is also unbiased, and if we are not a spe¬ 
cial observer the average over the sources that we see with 
redshift 2 should also have unbiased flux density. Weinberg 
thus concluded that sources are, on average, unmagnifled 
and that the conventional formula for D[z) remains valid. 
In fact, as we show below, Weinberg’s result holds for every 
observer, not merely in an ensemble-average sense. 

This is a very powerful and general argument, which is 
not restricted to the weak-lensing regime - though it does re¬ 
quire that multiple images of sources from strong lensing are 
either unresolved or that the flux densities of the multiple 
images have been aggregated. If we define the magnification 
of a source y as the ratio of its flux density to that which 
an identical source would have at the same redshift in an 
unperturbed FRW model, or viewed along a path with no 
inhomogeneity, and imagine the source sphere at redshift 2 
to be tessellated into a very large number of equal area ele¬ 
ments, each containing one standard source, then averaging 
over these sources is equivalent to averaging over area and 
Weinberg’s argument is that {ij.}a = 1 where the subscript 
indicates averaging y weighted by area on the constant -2 
surface. 

The flux density is also inversely proportional to dA/dQ, 
the Jacobian of the transformation between position on 
the source plane and angle on the observer’s sky (con¬ 
servation of surface brightness means the flux density in¬ 
creases with dfl for given dA). The average of the inverse 
of the Jacobian, weighted by area on the source sphere, is 
{dtl/dA)A = f dA{dQ,/dA)/ f dA = 4n/A. We emphasise 
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that (• • •) is not an ensemble average, but simply an aver¬ 
age over the source sphere. Multiple leasing is accounted for 
because the dfl for the different images add into a single 
element of total solid angle. Invariance of mean flux density 
is therefore equivalent to the assertion that the surface of 
constant a has the same proper area as would be the case if 
the matter inhomogeneity were smoothed out. 

2.1.2 Distance bias 

Weinberg’s endorsement of the conventional formula for 
D{z) does not imply that the distance, averaged over 
sources, is unaffected by lensing. Rather, the mean flux den¬ 
sity of standard candles uniformly or randomly distributed 
over the constant-z surface is unperturbed; i.e. the aver¬ 
age of 1/D^ is the same as its value in a uniform universe. 
Now, the distance is a non-linear function of IjD^, as is the 
magnitude, and l/D^ is a quantity that fluctuates between 
different lines of sight (having a first order fractional pertur¬ 
bation 2k in the linear regime). As a result, the distance- and 
magnitude-redshift relations are both biased with respect to 
the conventional formula for D{z). 

Estimating this bias for point-like sources is difficult 
since small-scale structure may cause large fluctuations in 
the magnification for narrow beams. Consider a (possibly 
fictitious, though of the kind considered in perturbation 
theory) universe with only small amplitude surface den¬ 
sity perturbations. The distance is proportional to 
which, can be expanded, with A/r = /r — 1, as DjDo ~ 
1 — Afil2 3(A/r)^/8 -I- .... The average over sources of the 
linear term vanishes, according to Weinberg, but the second 
order term does not average to zero. Instead, there is a sta¬ 
tistical bias in D, with respect to its value in a homogeneous 
universe Dq, of 

{D/Do)a = 1 + -{(A^)^)a -I-... = 1 -I- + ■■■ (8) 

where the second equality, involving the mean squared weak 
lensing convergence k, applies in the perturbative regime 
where A/i = 2k-|- .. .. Note that as the average distance per¬ 
turbation is second order we do not need to specify whether 
the average of is weighted by area or solid angle as the 
difference between these is a third order effect. 

Similarly the average of D^/Dq = is readily found 
to be 

Dq)a = 1 -I- {(Ap)^)a -|- ... = 1 -|- 4(k^) -|- ... (9) 

These are precisely the same as the distance 0 and area 
0 perturbations found by CUMD14. But clearly 0 is not 
the perturbation to the area: that would be the average over 
directions rather than over source-plane area, whereas 0 is 
the average over sources of /Dq assuming that the area 
is actually precisely unperturbed. 

The applicability of these formulae to point-like sources 
in the real Universe is somewhat questionable since galaxy 
clustering observations tell us that (k^) grows roughly in¬ 
versely with scale while the effective beam size, which in¬ 
troduces a cut-off, is tiny and extrapolation is difficult. El¬ 
lis, Bassett & Dunsby (1998; hereafter EBD98) argue quite 
convincingly that this ‘ultraviolet divergence’ problem for 
(k^) is potentially real and should not be ignored, though 
this is constrained empirically by modelling of the scatter 


in supernova flux densities and, out to « ~ 1 at least, any 
enhancement in the scatter from lensing is small (Sullivan et 
al. 2011; Conley et al. 2011). The bias estimated from large- 
scale structure alone, however, would apply in a hypothetical 
observation where measurements of the average flux density 
are made on a patches of sky containing large numbers of 
sources and the inverse square roots of these then averaged. 
It should also correctly describe angular area magnification 
of structures in the CMB. 


2.1.3 Ellis, Bassett & Dunsby’s objection 

A weakness of Weinberg’s argument, as was emphasised by 
EBD98, is that he assumes that the surface of constant 2 is 
a sphere and that its area is unaffected by structures along 
the line of sight. It is true that static lenses have little effect 
on the redshift of sources, but in the real universe the set of 
observers who see a source to have redshift 2 at some time t 
do not lie on a sphere, rather the surface will in general will 
be slightly aspherical because of time delays associated with 
the inhomogeneity, and if there are caustics it will be folded 
over on itself on small scales, so along any light path from 
the source there may be multiple observers at slightly dif¬ 
ferent distances who see the source to have redshift 2 (each 
one of these observers will see multiple images with very 
slightly different redshifts). Similarly the set of sources that 
we perceive to have redshift 2 at the present will lie on some 
aspherical and generally microscopically multi-foliated sur¬ 
face, a section of which is illustrated schematically, though 
in grossly exaggerated form, in Figure 

EBD98’s focus is on the effect of small-scale struc¬ 
ture and its associated caustics. They emphasise the UV- 
divergence mentioned before and how this may in principle 
signihcantly increase the observed areas corresponding to a 
net solid angle even when averaged over large angular scales. 
This seems to us to be beside the point. The effect of folding 
of the surface is already taken into account by Weinberg in 
requiring that multiply imaged sources are either unresolved 
or their flux densities be aggregated. More significant is how 
much the area is biased, not counting the small-scale fold¬ 
ing. Referring again to Figure we would argue that the 
relevant question is: what is the effect of structure on the 
area of the outer surface? We answer this in 0 For now we 
assume that there is no effect, and turn to consider direction 
averages which are more relevant for CMB studies. 


2.2 Direction averaged properties 

The studies mentioned above were mostly concerned with 
the magnification of point sources. Regarding the lensing of 
anisotropies of the CMB, many studies have followed the 
pioneering work by Cole & Efstathiou (1989). Here we shall 
focus only on the issue of the mean magnification, review¬ 
ing the argument presented by Kibble & Lieu (2005): when 
averaging over directions on the sky, it is the inverse mag- 
nihcation that is conserved; we also discuss how sky- and 
source-averages are related. 
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Figure 2. Grossly exaggerated illustration of the form of the sur¬ 
face of constant redshift in the case of strong lensing. The lines are 
rays of light that start on, and are perpendicular to, a wavefront 
on the left. This surface is distorted as a result of time delays 
induced by the lenses that the light has previously encountered 
(not shown). The rays are propagated to a constant redshift sur¬ 
face on the right. This can either be viewed as the surface of 
sources that an observer sees to have redshift 2 : at some epoch, 
or as the surface around a source hosting observers who see that 
source to have redshift 2 :. Weinberg’s flux conservation argument 
relies on the assumption that e.g. the area of the outer surface 
here is identical to the area of a sphere of the same constant 2 : 
in an unperturbed universe. If it is, the flux density, averaged 
over observers on this surface is the same as for a homogeneous 
universe. In reality, this surface is slightly deformed, and its area 
is biased, so the mean flux density is not precisely unbiased. But 
as we argued in the caption to Figure ^ and discuss further in 
^and in appendix]^ the bias is predominantly caused by large- 
scale density perturbations that are well understood, and the bias 
is extremely small and, for all practical purposes, negligible. 


2.2.1 Conservation of inverse magnification 

Kibble & Lieu discussed the average magnification using a 
model of uncorrelated random clumps of matter. But more 
significantly they emphasised the important and general dis¬ 
tinction between averages over sources - or equivalently over 
areas on the source plane - and averages over directions on 
the sky (i.e. averages weighted by solid angle): 

“We may choose at random one of the sources at redshift z, 
or we may choose a random direction in the sky and look for 
sources there. These are not the same; the choices are differently 
weighted. If one part of the sky is more magnified, or at a closer 
angular-size distance, the corresponding area of the constant-z 
surface will be smaller, so fewer sources are likely to be found 
there. In other words, choosing a source at random will give on 
average a smaller magnification or larger angular-size distance.” 

For source averaging. Kibble & Lieu reason that since 
the distance is, by definition, D = y/dA/dO. and the flux 
density S is proportional to 1 / then, if Dq is the distance 


for a standard source viewed along an unperturbed path, 
the amplification is /i = Dg /and its average over area on 
the source (or observer) surface is 


in) A — Dg 





JdA (dtl/dA) 


47rDo 

A 


( 10 ) 


We have already invoked this result above in saying that 
Weinberg’s result = 1 implicitly assumes that the area 
is A = 47rDo and is unaffected by lensing. 

For direction averaging, they show that a precisely anal¬ 
ogous statement can be made concerning 


(f,-% = 



f dn (dA/dfl) 
~D[fdn~ 


A 

4^ 


( 11 ) 


so, again if one assumes the total area A is unperturbed, it 
is the direction average of that is conserved. 

In the absence of strong lensing both of the above results 
are unexceptionable. But with multiple imaging the last step 
in (111 is questionable: if an element of surface area can 
be reached via paths that start in disjoint elements of solid 
angle, it would be counted multiple times - so that one would 
expect f dfi (dA/dfl) to be greater than A. Kibble & Lieu 
claim that (111 is of general validity, but in doing so they 
take a very different definition of magnification than the 
one employed here. Rather than taking Dgfi~^ to be the 
modulus of dA/dfl, they include the sign of the Jacobian of 
the transformation from angle to area coordinates, so that 
for some images is formally negative. When there are 
multiple images, and in general there are an odd number 
2n -I- 1 of these, then n of them have odd parity (Blandford 
& Narayan 1986); these therefore have negative Jacobian, 
which effectively cancels the multiple counting of areas. In 
( |10[ ) the integral over area is understood to be over the outer 
surface - which has a one-to-one mapping to solid angle - 
and the parity of the outer surface is, as shown again by 
Blandford & Narayan, always even. Since the parity is not 
easily observable, (111 is of limited practical utility when 
there are strong lenses. But to the extent that strong lensing 
can be ignored - if the optical depth is very low or if one is 
concerned with unresolved compact sources or with the size 
of large structures (such as acoustic peak scale ripples in the 
CMB) - then it is the mean of the inverse of the absolute 
magnification that is conserved. 

These results can also be understood in terms of the 
probability distribution for amplification. One can imagine 
calculating /r = Dgdfl/dA for an ensemble of rays fired in 
random directions and propagated a path length Dg. Denot¬ 
ing the probability distribution for fj, in such an experiment 
by Pnin) then Pn{^)dgL is the fraction of solid angle for 
which /r lies in a range d/r around p, so Pn{n)dn = dfl/4n. 
If there are no multiple images, the element dfl maps to an 
area dA = Dgdft/fj,. The fraction of the total area is thus 
dA/A = Dgdfl/nA — {4 ttD g/A)Pn(fi)dfi-, but this must 
also be equal to PA(n)diJ,, where PA{n) is the probability 
distribution for p over area, so the two probability distribu¬ 
tion functions are related by Pa (ft) = (^nDg / A)gr^ Pn(ri)- 
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This gives 


(m)a = 




/ d^j, fiPAjr) _ 471 Dp / dn Pn jr) _ 
J dfi PAi/ji) AjduPAifJ.) 

f dfj, fi~^Pfi(fj,) _ Af dfxPA(fi) 


A 


( 12 ) 


A 


f d^j, PsiifJ-) 47 rZ)g / dfi PnifJ-) 




(13) 

consistent with ( | 10 [ | and 0 - Putting these together shows 
that = 1 , so conservation of one implies conser¬ 

vation of the other. And clearly both rest on the assumption 
that area is conserved. 

If this assumption is correct then because fj,~^ is a fluc¬ 
tuating quantity one would expect {Afi}n 7 ^ 0. Writing 
A/r = (1 -I- — 1 (where A/r“^ = — 1) and 

expanding gives {A/r)n = -(A^“^)n + ((A/r“^)^)n -|- .... 
But {iJ,~^)n = 1 means the first term is zero and, since 
Afi~^ = —2k -I- ..., we would have, in the perturbative 
regime, (/i)f 2 = 1 -1- 4(k^) -|- ... (which, we note, is the same 


as {/i )a obtained in 12 . 1 . 21 . 


An alternative way to get this result is to note that con¬ 
servation of area J dA — A-kDq and the definition of mag- 
nihcation /r = DodQ/dA imply that for any function of the 
magnification F{fi) the different averages are related by 


{F)a = 


JdAF _ JdQ {dA/dQ)F 


JdA 

Jdflfi-k 


fdn 


Dgfdn 


(14) 


and similarly 


(F}a = (aF}A, 


(15) 


these relations being exact to the extent that the optical 
depth for multiple images is small. With F = in ( |15[ ) we 
have (^)n = = (1 + 2A/r -|- (Ap)^)a so {A/r)n = 

{(A/r)^) is exact (though (A^}n = 4(k^) is only true to 2 nd 
order precision). 


2.3 Geodesic deviation calculations 

Metcalf & Silk (1997; hereafter MS97) used geodesic devi¬ 
ation (rather than using the optical scalar formalism as in 
most other studies) to calculate the magnihcation of the cos¬ 
mic photosphere to second order precision. With a COBE 
normalised power spectrum of density perturbations (Ben¬ 
nett et al. 1996) they found that lensing produces a non-zero 
mean magnihcation of structures on surfaces of constant red- 
shift if weighted by solid angle on the sky, but it is only at 
the ~ 10 “® level (it is on the order of ( 0 ^)) and so is, for 
all practical purposes, observationally negligible. This would 
seem to say (/r)n = 1 , where the subscript denotes an aver¬ 
age over direction (i.e. averaging with equal weight per unit 
solid angle). 

That, however, would be at odds with Kibble & Lieu 
whose conservation of inverse amplihcation implies, as we 
have seen, a relatively large 0{{k?)) bias in {pjn- The reso¬ 
lution, however, is straightforward; the quantity that MS97 
calculate is actually the mean inverse magnihcation, as we 
now show. 

MS97 calculate the trace of the distortion tensor D = 
d5&/d& (where © denotes the 2D angular position vector 


on the hat sky), and the expectation value of its integral that 
gives the parallel component of the change in separation on 
the source plane of pairs of beams on the sky of separation 
s: 


s/2 

/3|l(|s|)= f s.(D).d©. (16) 

-s/2 


For s = |s| smaller than the angle subtended by the co¬ 
herence length, and dividing by s to get a fractional quan¬ 
tity, this is Pw/s = (Tr(D)), i.e. the trace of the distor¬ 
tion. This is not the magnihcation. Nor, in general, is it the 
inverse magnihcation, which is the determinant |D|. How¬ 
ever, if we write the distortion as D = I-I-S 1 -I-S 2 -I-..., 
where I is the identity matrix and subscripts denote terms 
that are of 1 st order, 2 nd order etc. in the potential, then 
|D| = 1-1- Tr(Si) -I- (|Si| -I- Tr(S 2 )) -I- ... up to second order. 
The trace of the hrst order term vanishes for a random huc- 
tuating potential with zero mean. It turns out (see below) 
that if the potential huctuations are statistically spatially 
homogeneous the ensemble average of the determinant of 
the hrst order distortion vanishes also, {|Si|) = 0, so, for 
random lenses, the mean inverse amplihcation perturbation 
is just the trace of (D). 

We show, in appendix that MS97’s result can be 
expressed as 


.^0 

= 1 - ^ y dA (Ao - A) J 
0 


(17) 


where A is conformal distance along the path and J is as de- 
hned in 0 . If the lensing structures have ‘coherence length’ 
L then J ~ {(fP)jL so — 1 ~ (fPXjL, consistent with 

the hand-waving argument in the caption to Figure 1 that 
this is on the order of the mean square cumulative dehection 
angle. As we shall see, however, the actual effect differs from 
(171, particularly for lenses close to the observer, but (171 
has the correct order of magnitude. In any case, MS97’s re¬ 
sult is not in conhict with Kibble & Lieu, which is the main 
point of this section. 


2.4 Effect of a thin lensing screen 

Further evidence against there being any 0{{kP)) pertur¬ 
bation to = 1 comes from considering lensing by a 

single deflecting screen or shell at conformal distance Ad; 
this is similar in principle to, but much simpler than, the 
full 3D calculation of MS97. As we shall see in the follow¬ 
ing sub-section, this also sheds light on claims for significant 
source-averaged flux amplification. 

In this model rays travel along straight paths in con¬ 
formal coordinates, receiving a small transverse deflection 
at the lensing screen. As we are primarily concerned with 
small structures, a reasonable first approximation is to work 
in the ‘flat-sky’ limit where both the screen and the source 
surface are assumed to be planar, using a 2-D Cartesian co¬ 
ordinate system to describe deflections and displacement of 
rays for a beam that propagates along the 2 axis. The matrix 
relating positions x' on the source plane (scaled by Ad/Aa) 
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to positions on the deflector plane x is 

dx' 
dx 

where k = and {71,72} = —{$11 — ^ 22 , 2 ^ 12 } with 

= d^^ldxidxj and $ = [Ad(As — Ad)/A3] f dX tp where (p 
is the Newtonian potential and the integration is through the 
deflecting shell. Thus k is the usual weak lensing convergence 
(the surface density in units of critical value) and 7 the 
image shear. It follows from the definition of k and 7 that 
^2 _ ^2 ^ 4 ($iiCE >22 - $?2) = 4|VxVx-I>|. 

The determinant of this matrix is the inverse magnifi¬ 
cation: 

= (1 — k)^ — 7 ^ = 1 — 2k -|- — 7 ^. (19) 



1 - K -I- 71 72 

72 1 — K — 71 


(18) 


At linear order this is just 1 — 2k, and the average of k 
over directions from the observer - which is equivalent to 
an average over the deflecting screen - vanishes, so the net 
averaged non-linear inverse magnification is {fj,' ^)n = 1-1- 
(k2-7^). 

In the ‘flat-sky’ limit we can write the 2D lensing poten¬ 
tial 4>(x) as a Fourier sum 4>(x) = <E>k exp(ik ■ x). For a 
statistically homogeneous random deflection screen, the ex¬ 
pectation value of the product of the potential coefhcients 
for distinct Fourier modes vanishes, so ('Fk'I’k') ~ A:>(k)(5kk' 
with P.j(k) the power spectrum. This follows directly from 
the assumed translational invariance of the statistical prop¬ 
erties of the random deflector screen. An immediate conse¬ 
quence is that (k^— 7^) = 4(|VxVx$|) = 4(<1 'ii<I>22 —' 1?{2) = 
4 P,s>(]i)(k^ky — (kxky)'^) — 0. Thus the 2nd order contri¬ 

butions to the inverse magnification, k^ — 7^, average to zero 
when we average over positions on the deflection screen, or 
equivalently over direction at the observer. The direction av¬ 
eraged inverse amplification in this model is therefore unity, 
consistent with Kibble & Lieu. Note that we have not im¬ 
posed any restriction on the strength of the lensing screen; 
however, as with (111, our approach is of questionable utility 
for strong lensing as the inverse magnification is the Jaco¬ 
bian, which will be negative for some directions, rather than 
the modulus of D^dA/dQ,. 

To obtain the actual effect - which does not vanish - 
one needs to allow for the finite size of the deflecting struc¬ 
tures and compute the deflection with post-Born corrections 
allowing for the non-flatness of the sky etc. If we imagine 
gravitational potential fluctuations of size L - the ‘coherence 
scale’ - and consider a shell of such objects around us then 
e.g. the convergence is k = [Ad(As — A^j/Aa]/ dX Vx^<(> ~ 
X J dX \7j_^(p (where we are assuming a ‘typical’ distance 
to the screen; i.e. not very close to the observer or to the 
sources). At first order this integral can be taken along the 
unperturbed path. At next order one must allow for the 1st 
order deviation of the ray from the unperturbed path by a 
perpendicular displacement Axx ~ Vx<(>L^ ~ (pL. Allowing 
for this might suggest a second order contribution to k whose 
ensemble average is non-zero: (k) ~ A J dA Axx • Vx Vx^(() ~ 
{X/L)(p^. Comparing this to (k^) ~ {X/L)‘^cp^ we see that 
this is much smaller. In fact, as we shall see, and as is sug¬ 
gested by (171, there is no effect that is of first order in X/L 
as there are other corrections that cancel. The leading order 


effect of a single thick screen is (A/i 
of X/L. 


cp , independent 


2.4-1 Direction averaged distance and magnification for a 
thin screen 


While the mean of the determinant (191 is unity, the same 
is not true for its square root, or equivalently the apparent 
distance D/Dq = Expanding this for small k, 7^ and 

keeping only up to second order contributions gives 


D/Do = 1 — K — 7^/2 -I- ... 


( 20 ) 


Taking the ensemble average of this, the first term vanishes 
and since (7 ^) = (k^) we have 

{AD/Do)(i = -{k^)/2 + ... (21) 


similar to ([^ but with —1/2 in place of -1-3/2. We can 
also obtain this from (/r“^)n = 1 much as we did for the 
source averaged distance, since D/Do = (1 + A/r~^)^/^ = 
1 + /2 — (A/r“^)^/8 J- ... where again averaging over 

directions the first order term vanishes and we can use 
Ap ^ = —2k -j- .... 


Similarly if we take the inverse of (191 and expand we 
have, up to 2nd order. 


^i=l + 2n + {3 k +'r^) + ... (22) 


and taking the average over the deflector surface the linear 
term goes away and we have {fi) = 1-1- (3k^ + 7^) -!-.... This 
would seem to be the origin of the result § of Clarkson et 
al. 2012 for the mean amplification of sources. But averaging 
over the deflector surface is an average over directions on 
the sky, not an average over sources. We saw in the previous 
section that (k^ — 7^) = 0 for a statistically homogeneous 
screen, so (k^) = ( 7^) so the above is {/r)n = 1 -|- 4 (k^) J- ... 
consistent with the result given at the end of §2.2. 1| 

Finally, we can note the further consequence that the 
mean convergence to sources is biased low. Averaging (221 
over area, and using (7 ^) = (k^), we see that 


(k)a = -2{k) . 


(23) 


The interpretation of this result is discussed further in ap¬ 
pendix]^ 


3 AREA BIAS 


As we have emphasised, the above conservation theorems 
depend on the assumption that the source surface has an 
area that is unaffected by metric fluctuations.: 


in) A = 


AttUq 

A 


4:7VD 


2 • 

0 


(24) 


Weinberg’s flux-conservation argument is that the mean flux 
density is the ratio of the telescope aperture to the area of 
the outer surface of constant z] if this area is increased, 
then observers will measure a decreased mean flux den¬ 
sity and vice versa. But as indicated in Figure]^ there are 
good grounds to expect that the outer surface is indeed not 
precisely spherical in the presence of foreground inhomo¬ 
geneities. We now elaborate on this issue and calculate the 
corrections to (241, which turns out to be very small - on 
the order of the squared cumulative deflection angle. 

Equations (241 indicate a reciprocal relationship {fJ,}A = 
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n. If we relax the assumption that the area is un¬ 
perturbed by lensing we can generalise this as follows: If we 
consider a solid angle dQ, the number of unit areas N that 
fall within this beam on the surface at redshift a is propor¬ 
tional to dA/dO., while the flux density of standard sources 
(or IjD^') is inversely proportional to this. The mean flux 
density - or equivalently the mean inverse apparent distance 
squared - is therefore 

/ 1 \ _ JdflNdfl/dA _ JdQ _/dA\~^ 

jdnN ^ JdndA/dn ^\dn/^ ' 

(25) 

The solid angle here can be considered to be at the observer, 
in which case the area weighted average is an average over 
the sources seen by that observer, or it may be considered 
to be at the source, in which case the area weighted average 
is an average over observers as considered in Weinberg’s ar¬ 
gument. Regardless of which interpretation one adopts, the 
above formula says that, as before, we have (p’ -')n = l/{fi)A 
so the reciprocity of these averages is valid in general. This 
relation means that one can calculate the mean of l/D^ for 
the sources seen by an observer by calculating the average 
over that observer’s sky of dA/dQ and then taking the in¬ 
verse. 

But if the area is biased, this cannot be to the same 
extent for all observers (there will always be some rare ob¬ 
servers who inhabit spheres containing negligible fluctua¬ 
tions). Ideally, therefore, one would want to know the prob¬ 
ability distribution for the sky average of dA/dQ, but calcu¬ 
lating that is very difhcult. What is amenable to calculation, 
however, is to calculate the ensemble average of {dA/dQ}n 
(averaged over an ensemble of randomly placed observers). 
What makes this tractable is the fact that we wish, natu¬ 
rally, to assume that the metric perturbations take the form 
of a statistically homogeneous and isotropic random field. 
Under that assumption the ensemble average of {dA/dQ}n 
is precisely the same as the average of dA/dQ over an en¬ 
semble of realisations for the metric perturbation field for 
a single ray fired from the origin along a random direction 
(or along the z-axis say). We will denote this average by 
{dA/dQ) 

ens • 

This does not provide the full probability distribution 
for {dA/dQ)n, for which one would also need to know, at 
least, the RMS fluctuation about the ensemble mean. But 
if we assume that the average over any one observer’s sky 
of dA/dQ comes from a large number of effectively statisti¬ 
cally independent regions then it would seem reasonable to 
assume that sky average for any observer will be given, to 
a good approximation, by the ensemble average of the sky 
average. And if so it should also be valid to approximate the 
mean flux-density amplification of sources for one observer 
by the inverse of D//^ {dA/dQ) ena- We shall therefore calcu¬ 
late, in the first instance, the ensemble average of dA/dQ 
(which, when multiplied by Att, gives the ensemble mean of 
the area and hence the mean fractional perturbation to the 
area (A)ena/Ao — 1 with Aq the unperturbed area), although 
we shall return to the question of fluctuations shortly. 

The calculation of {dA/dQ)ei,a is presented in appendix 
[A] Here we give an overview of the essential points. As be¬ 
fore, we motivate this via a simple model of random over- or 
under-densities of scale L and density contrast A for which 
the Newtonian gravitational potential - cast in dimension¬ 


less form by dividing by - is ~ K A/P. Along the 
way we provide the more quantitative key results from ap¬ 
pendix 1^ which are valid for arbitrary random perturba¬ 
tions; the mean area perturbation being expressed purely in 
terms of the 2-point function of the metric perturbations, in¬ 
dependent of higher-order statistics. Following this, in §3.3| 
we show that the mean bias is dominated by structures of 
scale of order tens of Mpc. In |3.4| we return to the ques¬ 
tion of how large are the fluctuations in the area for any 
particular observer compared to the ensemble average. In 
the interest of clarity henceforth all averages (...) will be 
understood to be ensemble averages unless otherwise explic¬ 
itly indicated. 


3.1 Surface of constant distance travelled 


We first consider a geometrical effect: owing to the wig- 
gly nature of the light paths, the radius reached by a path 
of total length Ao will be less than Aq. As usual, we carry 
out this calculation viewing the rays as propagating back¬ 
wards in time from the observer. The light deflection angle 
by a single localised structure is the integral of the trans¬ 
verse potential gradient through the structure so this is 
01 ~ L\7±(f> ~ (f>. As we are assuming a spatially flat back¬ 
ground, simple geometry tells us that the deflector must 
be displaced from the straight line from observer to the 
surface by d± = ©lAodAds/Aos, where the subscripts de¬ 
note observer, deflector and (source) surface and where A is 
background conformal coordinate distance along the path. 
Pythagoras tells us that the change in distance reached 
(as compared to the sum of the hypotenuses Aod + Ads) is 
AA = —(l/2)d]^Aos/(AodAds) working to second order preci¬ 
sion. Combining these gives the change in distance reached 
AA = -(l/2)e?Aod Ads/Aos. W’e see here the usual flensing 
kernel’ that suppresses the effect of deflectors close to either 
end of the path. Thus there is no scope for anomalously large 
effect from deflectors near the end point. 

We next assume that the effect (on the distance 
reached) of the A ~ A/L multiple deflectors along a line 
of sight is simply the sum of the effects of individual deflec¬ 
tors. In this regard, we note that in the above paragraph 
we are not ‘solving the lens equation’ for some given con¬ 
figuration of observer, source and deflector. We simply fire 
off a ray in an arbitrary direction that happens to meet a 
deflector, and ask how far away in background coordinates 
will the end of the ray be after it travels a net path length 
Aod + Ads. Thus the reduction in background conformal dis¬ 
tance from N independent deflectors is just the sum of the 
(second order) effects from individual small angular deflec¬ 
tions, to give {Ar)/r ~ (0^) and a corresponding change 
in area of twice this. This is confirmed in the perturbative 
regime in appendix]^ where we find that the perturbation 
to the area of the constant distance travelled surface is 


{AA)/Ao = 2{Ar)/r 



>^0 

J dX A(Ao - A) J(A), 
0 


(26) 


with J as defined in 0 and where we see, as expected from 
the consideration of a single deflector, the presence of the 
lensing kernel A(Ao — A)/Ao. 

The mean perturbation to the area of the constant 
distance travelled surface is thus determined solely by the 
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power spectrum, or equivalently by the 2-point correlation 
function, of the metric fluctuations. The perturbation to the 
distance reached for any individual line of sight is also of sec¬ 
ond order in the metric perturbations, there being no first 
order perturbation. This means that if the size of the pertur¬ 
bations L (or the correlation length) is much less than the 
path length - which is a good approximation for the struc¬ 
tures that are relevant here - the variation in Ar/r between 
between different paths will be very small. More precisely, 
we would expect {{Ar/r — (Ar)/r)^)^^^ ~ (Ar)/r 

{Ar)/r, with the numerical coefficient being determined by 
higher than 2-point statistical properties of the metric fluc¬ 
tuations. Thus the surface of constant (conformal back¬ 
ground coordinate) distance travelled should be visualised 
as almost exactly spherical and with radius in background 
coordinates r = Ao — Ar. 


3.2 Surfaces of constant redshift or cosmic time 


We now discuss the countervailing increase of area from 
wrinkling of the surface. In the Introduction we made an 
analogy with the surface of a swimming pool perturbed by 
small amplitude - i.e. height ^ wavelength - random waves, 
which yield a fractional change in area of (0^/2). We now 
explore this in more detail, and draw attention to one short¬ 
coming of the analogy; but we show that this does not sig¬ 
nificantly change the basic conclusion that the cosmological 
effect is also of order (0^). 

Despite the conceptual simplicity, this calculation is 
rather more subtle in detail than the radial bias from the 
distance-covered effect. A number of terms arise, whose ori¬ 
gin is as follows. We start with a specific beam of solid an¬ 
gle dQ at the observer, which would correspond to an area 
dAo = AodD at the photosphere in the absence of struc¬ 
ture. As a result of lensing magnification, this beam passes 
through a different area dA at the constant distance trav¬ 
elled surface, which we can write as dA = r^dtl', where 
we are defining the fictitious solid angle dQ' as that which 
the area dA (which is perpendicular to the outward normal, 
though not perpendicular to the beam direction) would sub¬ 
tend if there were no light deflection. It follows then that 


dA _ dn^ f. 2Ar \ 
dAo dQ \ r y 


(27) 


What we actually want is the expectation value of the 
area of the intersection of the beam with the actual (i.e. 
perturbed) photosphere, which we will denote by dA' (see 
Figure 1). This differs from dA by two further multiplicative 
factors: 

^ = (I-0V2) X (l-b26'AA). (28) 

ciJ\. 


These arise as follows: The beam is not, in general, perpen¬ 
dicular to the surface of constant distance travelled but has 
some tilt, which we denote here by 0. This is a first order 
quantity that we compute using the geodesic equation. The 
first factor (times dA) is therefore the cross-sectional area 
of the beam at that point. The second factor in (281 is the 
amount by which the beam expands or contracts in passing 
from the surface of constant distance travelled to the actual 
photosphere. Here 9 = A/2A is the expansion rate, with A 


the cross-sectional beam area in conformal background co¬ 
ordinate units and A its rate of change with path length, 
and AA is the extra path length - which may be positive or 
negative - caused by the gravitational time delay: 


AA = 2 / d\(h. 


(29) 


As we are considering the effect of intervening lenses we 
can ignore the effect of the perturbations at the end of the 
rays, so the photosphere is the intersection of our past light 
cone with the surface of constant cosmic time t = tree- That 
means that it is a surface of constant optical path or, equiv¬ 
alently, a wave-front or location of a backward propagating 
pulse of radiation. It is therefore perpendicular to the ray 
direction at the end of the beam, so there is no additional 


angular correction factor needed in (28l. 


In the absence of perturbations the beam expansion rate 
is just 0 = 1/A, so we can write 9 = 1/A -|- A9, where A9 
is the perturbation to the expansion, and combine (271 and 
(1281 to obtain 




dA' _ dQ' 
dAo dQ 

X (l-b2(l/A-f A6»)AA). 


(30) 


We need to ensemble average this equation, retaining all 
terms up to 2nd order. Parts of this are straightforward: Ar 
is a 2nd order quantity so we do not need to worry about 
correlations between it and any other factors. The same is 
true of 0^. In appendix [ a] we also show that {dQ'/dQ) = 1. 
For the present calculation, we can therefore replace dQ'/dQ 
by 1 — 2k, and the only complication is to allow for the cor¬ 
relation between k and other first-order terms. We now dis¬ 
cuss the various factors here in terms of the ‘random blobs’ 
model. 

The first order (i.e. Born approximation) time-delay - 
or perturbation to the path length to the photosphere - 
for a single perturber is AAi = 2 J dA />, where the inte¬ 
gral is through the structure, so AAi ~ </!/. The cumula¬ 
tive effect is a random sum of N of these with RMS value 
AA ~ %/iVAAi ~ (j>\/~XL where now is the RMS po¬ 
tential fluctuation. This averages to zero when multiplied 
by the zeroth order expansion 1/A but it correlates with 
the first order expansion A9. At the end of a path that 
happens to be over-dense, both AA and A9 will be nega¬ 
tive, and vice versa for an under-dense path. The result is 
a systematic positive bias to the area at 2nd order. Now 
A9 is the rate of change of the first order convergence k 
so |A0| ~ |«:|/-^- Since |k| ~ (/{X/L)^^^ this means that 
(AA X A9) ~ (f/^X/L or just the same, to order of magni¬ 
tude, as the effect of the light path wiggling reducing the 
distance. In fact, 2 (AA x A9) = (0^) so this, combined with 
the third factor in (301 results in an area increase 1 -I- (0^)/2 
exactly as in the swimming pool analogy. 

The increase of area described so far depends only on 
the variance of the ray directions at the surface. It does not 
seem to depend on where along the path the deflections were 
imposed. Also, and interestingly, we find that for the case 
that there is no evolution of the metric fluctuations (as is the 
case for linear perturbations of an Einstein-de Sitter model) 
this increase in area cancels the decrease (261 from distance 


reached being less than distance travelled and the ensemble 
average effect would be zero. 
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But there are two more factors we have not considered. 
One is the possibility of a significant 2nd order (i.e. post- 
Born approximation) contribution from AA itself, as this 
multiplies the zeroth order expansion rate. But in fact this 
turns out to be sub-dominant and can be ignored. Finally, we 
need to consider the fact that k in dO,'/dD. is correlated with 
the path length perturbation AA. This gives a 2nd order 
term —4 (kAA)/A. With k ~ and AA ~ (f>y/XL 

this is yet another contribution to AA/A ~ ^XjL so this 
is also of order (0^) so this does not change the conclusion 
regarding the order of magnitude strength of the effect (but 
it does mean that the net effect is not zero for non-evolving 
metric fluctuations). 

The final result for the fractional change in area, com¬ 
bining the reduced distance travelled and the area enhance¬ 
ment from wrinkling, is obtained in appendix [X] 

.^0 

{AA)/Ao = ^ jdX (2A(Ao - X) + X^)J{X). (31) 

° 0 


This result is of second order in the metric fluctuations and 
is valid at leading order in the assumed small parameter 
L/X. For constant J this is (AA)/Ao = -|-(2/3)AoJ, which is 
positive - so the competing effects of paths wiggling and sur¬ 
face crinkling do not cancel. However, as anticipated in the 
order-of-magnitude argument presented in the Introduction, 
the change is extremely small: roughly a part-in-a-million ef¬ 
fect. Appendix [A] shows that J may also be interpreted as 
the rate of change of the squared transverse deflection with 
path length, so quite generally the perturbation to the area 
is on the order of of the cumulative deflection angle squared. 

If one is concerned with discrete sources, rather than the 
CMB, then the observationally relevant area is not a surface 
of constant cosmic time, but a surface of constant redshift. 
For linear density perturbations - and we will shortly see 
that the effect is dominated by such perturbations - the 
surface of constant cosmic time is not at constant observed 
redshift because of the ISW effect. One result of this, as we 
show in §A2[ is to change the first order perturbation to the 
path AA - to sources at distance Ao as caused by structure at 
distance A - introducing a factor 1 -|- {(j>'/<j))\{a'/a)xa in the 
integral in (291. Here (j)' = d(f>/drj and a' = da/drj. Another 
is that, unlike the photosphere, this surface is not normal to 
the beam direction, so there is an extra factor 1 -|- 0'^/2 - 
where 0'^ is the squared angle between the normals of the 
constant-2 and constant cosmic time surfaces - to convert 
from cross-section to area at constant 2. These effects, how¬ 
ever, are only significant for sources at low redshift and do 
not qualitatively change our conclusions regarding the size 
of the effects. 


o 



Figure 3. Contribution to J for the the concordance model as a 
function of wave-number. This quantity, when multiplied by the 
path length gives the fractional perturbation to the area, which 
we see here is dominated by modes of scale k~^ ~ 50/r“^Mpc. 
See j ]A4.1| for details. 


is an increasing function of scale. This increase does not 
continue to indefinitely large scales in conventional mod¬ 
els. As the spectral index increases the total variance con¬ 
verges, with most of the variance coming from the logarith¬ 
mic interval where n ~ 0 or scales of tens of Mpc. This 
is quantified in Figure which shows the contribution to 
J per logarithmic interval of wave-number from equation 
(A34l: dj/dlnk = 27rA:A^. As can be seen, the modes 
that contribute most strongly have inverse wave-numbers 

~ 50/i“^Mpc, while non-linear structures have very lit¬ 
tle effect. 

The shear 7 and the convergence k from sub-horizon 
scale structures are much larger, being on the order of k ~ 
XQ/L ~ {HL / /X. In contrast to the deflection angle 
this is a decreasing function of scale. For ~ lOOMpc scale 
structures with A ~ 15% the convergence is a few percent 
(e.g. Seljak 1996) while the deflection is ~ 30 times smaller 
(about a few arc-minutes or ~ 10~^ in r adia ns), and (k^) ~ 
10^(0^). More quantitatively, equation (311 indicates that 
the ensemble average of the fractional change in area caused 
by lensing by large-scale structure along the line of sight is 
very small, being slightly less than a part-in-a-million effect. 


3.3 What size of structures are important? 

Unlike (k^), one can argue that (0^) is dominated by large- 
scale structure, so that uncertainty from highly non-linear 
small-scale structure is negligible, and the overall effect is 
definitely extremely small. The evidence from galaxy clus¬ 
tering - in the quasi-linear and linear regime - is that 
^ oc 1/P or thereabouts. This measures the density vari¬ 
ance, so the density contrast of structures of some scale L 
is A ~ oc 1/L. As we have seen, the mean squared de¬ 
flection is (02) ~ iV0? ~ [HL/cf^P. With A oc 1/L this 


3.4 Fluctuations in the area 

We have calculated the ensemble average of the area of a 
surface of some redshift 2, but it is also relevant to ask if 
there could be large fluctuations around this figure. Regard¬ 
ing the second order effects, we have already shown that 
there is very little variation in the distance reached for con¬ 
stant distance travelled. As for the increase in area from the 
wrinkling of the surface, this depends on the square of the 
angular tilt of the surface. This will certainly vary between 
different directions, but for the scale of perturbations that 
are significant for the mean bias there are a large number of 
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coherence areas over the sky {N ~ so there should 

be small Cl(l/\/iV) fluctuations in the integral over the sky. 
This would suggest that it is very safe to assume that the 
change in total photosphere area shows negligible fluctua¬ 
tions between observers. 

But there is the first order contribution to the fluctua¬ 
tion in the area in ( |30| |: Aj 4/A = 2AA/A. For any one ray, 
this will be of order ~ (/)(L/A)^^^, and with N ~ /L^ in¬ 

dependent regions on the sky we would expect this effect to 
give rise to fluctuations in the net area AA/Aq ~ . 

For sub-horizon wavelength perturbations this is once again 
a tiny effect, but for horizon-scale density perturbations 
(L ~ A) this would be of order the RMS potential fluctua¬ 
tion on those scales. This would be similar in magnitude to 
the fluctuations in temperature of the CMB on these scales, 
or ~ lO”"*'®. In an ensemble average sense this vanishes as 
it is a first order effect, but it does mean that, in all likeli¬ 
hood, the area of our photosphere or a surface of constant 
redshift differs from the ensemble mean at this level, which is 
actually larger than the ensemble mean perturbation itself. 
But this is still a very small effect and is, for all practical 
purposes, negligible. 


4 SUMMARY AND DISCUSSION 
4.1 The area of the cosmic photosphere 


The main new result in this paper is to show that gravi¬ 
tational lensing causes a non-vanishing perturbation to the 
area of a surface of constant redshift or of the CMB photo¬ 
sphere. The result (311 is valid at second order in weak-field 
metric fluctuations, and was obtained under the assumption 
that the scale of the perturbations that are responsible for 
the effect is much less than the path length (i.e. that we are 
dealing with sub-horizon scale structures). 

Under these assumptions, the problem is isomorphic to 
optics in a refractive medium with random spatial varia¬ 
tions of the refractive index. The effects here are non-linear, 
but are not in any way associated with the non-linearity 
of Einstein’s equations. The structures involved may have 
Sp/p S> 1, but the metric fluctuations are small. We see 
no scope for additional intrinsically relativistic effects be¬ 
yond the usual treatment of light deflection in terms of the 
Newtonian potential and the curvature of the spatial hyper¬ 
surfaces. 

Our result was derived in perturbation theory, formally 
assuming that the image shear and magnification along all 
rays are small. In this approximation the dominant contribu¬ 
tion to {AA) /Aq comes from structures on scales of tens of 
Mpc with height-to-wavelength ratio - and therefore surface 
tilt - on the order of ~ L ~ 10~^. The mean 

of the change of area and hence the mean flux amplification 
is much smaller, being on the order (0^). This quantity con¬ 
verges to a well-defined limit, with little contribution from 
the smaller-scale structures responsible for strong lensing. 

Ray tracing through the Millennium simulation (Hilbert 
et al. 2007) shows that the high-z asymptotic optical depth 
for strong lensing is only r ~ 10~^, which is dominated by 
clusters of mass M ~ IO^'^Mq, with a similar optical depth 
probably arising from galaxy-scale haloes if baryonic effects 
are taken into account (Hilbert et al. 2008). It is possible 


that much smaller-scale structures cause most rays to high 
2; to be significantly sheared and amplified, and the con¬ 
stant 2 surface may be fractal on small scales, as argued 
by EBD98 and discussed in s But we believe that our 
result remains valid for the following reason. A high degree 
of small-scale folding of the surface might conceivably re¬ 
sult in a large decrease in the mean flux density, but only 
if the multiple images are resolved. For unresolved, or flux 
density aggregated, sources any further change to the mean 
flux density is negligible compared to the (already tiny) ef¬ 
fect from large-scale structure, simply because the bending 
angles associated with small-scale strong lenses is so small. 

In particular, we may ask whether the neglect of small- 
scale strong lensing could have a significant impact on the 
CMB. Like Ellis, Bassett & Dunsby (1998) we do not see 
how arcminute-scale strong lensing can affect the observed 
CMB sky at degree scales, in contrast to CUMD14’s claim of 
percent level effects by including structures down to scales 
of order 10 kpc. This is because the area of the photosphere 
mapped to by a disk of solid angle AH is determined, at the 
linear level, only by the mass density excess within the tube 
that the boundary of AH traces out. This is a consequence 
of the 2-dimensional version of Gauss’s law. Unlike paths to 
sources, which tend to avoid over-densities (see appendix [E|), 
beams of randomly chosen direction sample a density that 
is unbiased. The increase in A A/AH for those paths that 
pass between clusters is compensated for by the decrease for 
those beams that happen to encompass a cluster. 

We noted the minor distinction between a surface of 
constant-a and the photosphere. These are not precisely 
the same, as the Rees-Sciama and related effects cause 
slight perturbation to the redshift of the photosphere. This 
changes the area perturbation but does not qualitatively 
change our essential conclusion. 


4.2 Lensing conservation theorems 

The fact that the area of constant-redshift surfaces is in 
practice invariant justifies Weinberg’s (1976) claim that 
the mean flux density, or equivalently the mean inverse 
square distance, is unchanged by lensing when averaged over 
sources. It also confirms the complementary result of Kibble 
& Lieu (2005), that the inverse amplification averaged over 
directions is also unperturbed. 

Nevertheless, a major thrust of this paper has been to 
emphasise the importance of statistical bias in any non- 
conserved quantities - anything that is a non-linear function 
of magnification (or its inverse if averaging over directions). 
This includes distances and distance moduli. We have pro¬ 
vided formulae (equations |21[ ) for various examples of 
these biases in the perturbative regime and have shown that 
these are on the order of {rd). Recent claims in the litera¬ 
ture that find large results from non-linear relativistic per¬ 
turbation theory for e.g. the perturbation to the area of the 
photosphere seem to have resulted from a confusion of these 
effects and between source and direction averaging. 

We have shown in appendix that these effects may 
also be derived, although with considerable difficulty, from 
the focusing equation obtained from the optical scalar for¬ 
malism. We have also described how the non-vanishing mean 
inverse magnification of sources can be understood as aris¬ 
ing because light paths to sources tend to avoid over-dense 
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regions and therefore sample paths that have a convergence 
that is, on average, negative. We find that the fractional bias 
in column density is (k)a = —2{k^), but this was obtained 
in the perturbative regime and may only be a crude model 
for real absorption line studies. 

The virtue of the optical scalar analysis is that it is 
more explicitly relativistic in form. Agreement with our 
more simple-minded discussion in terms of lengths of rays 
and wrinkling of surfaces therefore provides some reassur¬ 
ance that this viewpoint is not lacking some subtle non- 
Newtonian relativistic effect. 

This analysis also helps clarify the meaning of the fo¬ 
cusing equation 0 - We have emphasised that despite the 
RHS of this being on average greater than in a structure- 
free universe, it does not indicate any tendency for structure 
to focus beams in the sense of changing their average area. 
The perturbation to the distance {D)/Do = 1 — (k^)/2 ob¬ 
tained from averaging the focusing equation in appendix [P] 
just what is obtained under the assumption the mean beam 
area is precisely unperturbed. Thus, despite its name, the fo¬ 
cusing theorem does not reflect any particular tendency for 
cosmic inhomogeneity to cause any systematic gravitational 
amplification of source flux densities. There is a real system¬ 
atic change to the square root of the beam area, which is po¬ 
tentially large, being on the order of ~ (k^) or ~ /K, 
but once again this simply reflects the statistical bias in '/A 
owing to dA/dil being a fluctuating quantity. The effect on 
the area in perturbation theory is suppressed relative to the 
mean distance perturbation by two powers of L/X to give 
the averaged un-focusing theorem 

{AA}/Ao = 0 -b 0(<i?XIL) (32) 

or, for all practical purposes, {AA)IAo = 0. Contrary to 
what the focusing theorem might naively be taken to sug¬ 
gest, beams of light tracked back in time from the observer 
actually wind their way through an inhomogeneous universe 
with barely any change to their average area. 

4.3 Possible statistical biases 

Evidently, for these ~ (k^) effects, the distinction between 
sky-plane and source-plane averaging is important, as is the 
choice of variable used as the diagnostic. Depending on the 
latter issue in particular, there may or may not be a bias. 
We therefore need to look at how analyses are actually per¬ 
formed in the critical cosmological cases of SNla and CMB 
analyses. 

For the case of SNla, leasing is routinely included in 
modern analyses. For example, Sullivan et al. (2011); Conley 
et al. (2011) account for a magnitude scatter of Um = 0.055 2 
in their fitting. Interestingly, however, the magnitudes are 
implicitly taken to be unbiased in the regression procedure, 
and it is not clear that this is correct. Denoting flux density 
by S, this is affected by the magnification as S oc fi, so that 
(S/So) ~ 1 (under area averaging, as is appropriate in this 
case). But 

(ln(S/5o)) = (ln(l + A/r)) -^((A/r)^) 


This relation is not entirely straightforward, since we need 


to worry about what happens at S' = 0 in performing the 
averaging. It is safer to work in reverse and ask if, for a Gaus¬ 
sian distribution of In S centred at zero, the flux is unbiased; 
it is not, by the same offset given above. Thus by fitting to 
magnitudes, high -2 supernovae are in effect treated as being 
fainter than they should be for their redshift. The effect is 
most marked at high z, where the dispersion is largest. For 
example, at 2 = 2 the nominal am = 0.11 yields a 0.3% 
increase in distance, which is equivalent to a shift of about 
Aui = 0.01 in the dark-energy equation of state. With the 
precision of present data, this effect is therefore unimpor¬ 
tant - but a more careful incorporation of the constraint of 
flux conservation may be necessary in future generations of 
experiment, with a target of sub-percent precision in w. 

In the case of CMB anisotropy measurements, it might 
seem that the appropriate average is sky-plane. The ob¬ 
servers decide a priori where to look and measure some 
property such as the angular harmonic £ of the first peak of 
the angular power spectrum. One could imagine averaging 
this quantity over different patches of the sky. That mea¬ 
surement would be biased by leasing, since ^peak oc . 

But if one were to average ^peak) which is proportional to 
the inverse magnification, this would not be biased. But 
one could also, in principle, detect peaks on the CMB sky 
and use their curvature as a cosmological diagnostic, and 
then average over peaks (which are equivalent to sources). 
Clearly in a region with positive (negative) leasing amplifi¬ 
cation both the number and curvature of the peaks will be 
biased low (high) so the result would be biased. But if one 
were to peak-average the inverse curvature this is like {p}a 
and the result would be unbiased. 

While it is therefore possible to analyse CMB data in 
a biased manner, the standard analysis method is not sus¬ 
ceptible to such a bias. What is done (Hu 2000; Challinor & 
Lewis 2005) is to calculate the angular power spectrum by 
modelling the observed sky as the primary fluctuations on 
an unperturbed sphere being distorted by the transverse de¬ 
flections from foreground structures, and keeping terms up 
to second order in the Newtonian potential (e.g. equation 
15 of Challinor & Lewis 2005). Cosmological parameters are 
obtained by computing the likelihood as the probability of 
the actual measured spectrum given this lensed prediction 
(as a function of the parameters of interest). Thus there is no 
scope to obtain a bias in the inferred parameters since the 
relevant quadratic effects arising from lensing are already 
properly accounted for. Any apparent tensions between the 
CMB and astrophysical estimates of parameters such as Ho 
cannot be explained by lensing. 
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APPENDIX A: THE PERTURBATION TO THE 
SOURCE SURFACE AREA 

We now compute the second order correction to the area of 
the cosmic photosphere or of a surface of constant redshift. 
We first justify the use of weak-field metric perturbations 
and we note the analogy between light propagation in a 
weakly perturbed FRW cosmology and in a medium with 
a non-uniform refractive index. We next discuss the bound¬ 
ary conditions for the end of the rays (which for the pho¬ 
tosphere corresponds to a surface of constant optical path 
in the lumpy glass analogy). We then perform the calcula¬ 
tion; we do this in two steps. We first calculate the distance 
reached after propagating a given path length Aq (i.e. a fixed 
path length in background coordinates) and the mean area 
of the intersection of a narrow bundle of rays with given 
solid angle at the observer with this surface. We then com¬ 
pute the extra contribution to the mean area that arises by 
propagating the extra (positive or negative) distance to the 
surface of constant optical path, allowing for the correlations 
between the the various first order effects (e.g. the expansion 
rate of the bundle and the extra displacement). 


Al Light deflection in weak field gravity 

We are interested in very weak field perturbations to FRW 
cosmology - metric fluctuations of order ~ or 

smaller - associated with very nearly Newtonian perturba¬ 
tions of scale (mostly) much less than the horizon size. For 
simplicity we will consider a flat background, as this seems 
to be a very good approximation to reality. We first consider 
the very weak field limit in which the metric has only one 
degree of freedom (in GR). Then we generalise to a met¬ 
ric that includes the off-diagonal terms associated with bulk 
motion of matter and show that this has an extremely small 
effect on the deflection of light rays (much less than one 
might imagine from the size of the metric perturbation). 


A 1.1 Light deflection by a static source 
The weak field metric is usually taken to be 
ds^ = a^{ri) [—(1 + 2%l))drf' J- (1 — 2(f>){dx^ -\- dy^ + dz^)] 

(Al) 

where the potentials ip and (p are some functions of the coor¬ 
dinates. In GR the two potentials are equal for nonrelativis- 
tic matter perturbations, ip = (p, and we assume that hence¬ 
forth. The linearised versions of Einstein’s equations show 
that this is the metric generated by non-relativistic matter 
with density fluctuations related to (p by \7^(p = 47 rG( 5 p/c^ 
where the Laplacian is in proper coordinates and 5p is 
the matter density perturbation. As mentioned in the In¬ 
troduction, we take the scale factor to be dimensionless: 
a = 1/(1 -|- z{rj)), so our conformal coordinates have di¬ 
mensions of length. 

Writing this as 

ds^ = + 2(p) [—drf + rdidx^ + dy^ + dz^)] (A2) 

with 


(1 - 2 />) 
(1 + 20 ) 


~ 1 - 20 


(A3) 
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we see that null rays have coordinate speed d|r|/dj; = 1/n 
and extremise the coordinate time: 

5 j d|r| n(r) — S J dX |r|n(r) = 0, (A4) 

where r = dr/dX and where the parameterisation of the path 
r(A) is arbitrary. The Euler-Lagrange equation from this is 

1 ■ 9 

nr + r(r ■ Vn) — ~ 1^1 (^5) 

which is not particularly useful, but if we fix A to be equal 
to the coordinate distance along the path (so |r| = 1), the 
geodesic equation is 


r = Vrn 


(A6) 


where h = Inn and Vr = V — r(r ■ V) is the derivative in 
the direction perpendicular to r. This is exactly the same 
as Snell’s law for rays propagating in a refractive medium 
with refractive index n (Born & Wolf 1965). Optics in an 
expanding universe with metric is the same as in lumpy 
glass with conformal coordinates playing the same role as 
ordinary physical spatial coordinates in a refractive medium. 

Additionally, if the potential is only a function of con¬ 
formal position, and not changing with time, then the lenses 
induce no change in the redshift of the photons. But if the 
potential is changing - either because one is dealing with 
perturbations in a non Einstein-de Sitter background or be¬ 
cause the perturbations are non-linear or moving or evolv¬ 
ing internally - then there will be redshift perturbations 
(Sachs & Wolfe 1967; Rees & Sciama 1968; Birkinshaw & 
Gull 1983). These again are also the same as would apply 
in a medium with a time varying refractive index. If the op¬ 
tical path length -- i.e. the number of waves along the path 
- is changing with time then Vrec, the number of waves per 
unit time at the receiver, will be the number of waves per 
unit time at the emitter minus the rate of change of the 
optical path, so i^rec = r'em(l + / dX dn/drj). This provides 
a novel way of thinking about the Integrated Sachs-Wolfe 
effect, and it is a phenomenon that is routinely used to tune 
frequencies in optoelectronics. 

Expressed in terms of the potential (j), the geodesic equa¬ 


tion is, from (A31 


-2V_L(j> 

1 - 4(()2 ■ 


(A7) 


This seems to suggest that the linear formula r = —2Vx<(> 
would be accurate up to 0{<p'^^(j)). In fact, as we now show, 
if we allow for the non-relativistic matter motion we find ad¬ 
ditional terms in the geodesic equation, but these are smaller 
than the linear term by a factor ~ (f>. 


A1.2 Weak fields sourced by moving matter 

The metric ( |A2[ ) is obtained using linearised gravity (i.e. 
working only to first order in the metric perturbations hap = 
Qap — rjap, with riap the Minkowski metric) and incurring 
errors on the order \h\’^ from e.g. using T]ap to raise and lower 
indices. It also assumes that the source of gravity is = 
Diagjp, 0, 0,0} with p the mass density. A more accurate 
model for the metric, still within the context of a linearised 
relation between the metric and the Einstein tensor, comes 


from including the momentum density source. This is ds^ = 
gc,pdx°‘dx^ with 


gap = ain) 


-{1 + 24 ,) 

V 


V 

(1 - 24)1 


(A8) 


where I is the 3D identity matrix and V is on the order 
of the potential 4 times the peculiar velocity of the matter 
V (we note that in general non-vanishing 3-stress also 
introduces differences between the diagonal terms but these 
are of order 4 P^ a-nd therefore much smaller). 

We now show how these extra ‘frame-dragging’ terms 
affect the geodesic equation. The result is very small; the 


corrections to the linear term in (A71 being smaller by a 
factor 4- 

To calculate deflection of light in the space-time ( |A8[ ) 
we can use the geodesic equation 


(A9) 


d^x°‘ _ _ d^d^ 

ds'^ ^ 2 ' dg dg 

where s is the affine parameter (unique up to a constant and 
a scale factor) and where 


r“ — 

t p-Y 


1 c 
2 ® 


{Qi'P,"/ 4 “ Qp^.u) 


(AlO) 


is the Christoffel symbol (e.g. Weinberg 1972). At zeroth 
order in the matter fluctuations gap = a^{ri)riap and we 
have (Px^ jdP = 0 and di^g/ds^ = —2(a'/a){dg/ds)‘^, where 
a' = da/dg, with solution dg/ds = a~‘^{g). Since dg = dt/a 
this means that the energy oc dt/ds = adg/ds oc 1/a as 
usual. 

Here we wish to compute properties of rays and wave- 
fronts given some statistical prescription for the metric fluc¬ 
tuations as a function of background coordinates. It is there¬ 
fore more useful to let the independent variable in the 
geodesic equation be the path distance in background spatial 
coordinates. Applying the chain rule, the geodesic equation 
with path variable being the z = x^ coordinate, for instance, 




dx^ dx'^ 


dx^ dx'^ da:“ 


dz^ dz dz dz dz dz ■ 

If we consider a ray that happens to be travelling along 
the z direction (i.e. with dx/dz — dy/dz = 0), or rotate 
our spatial coordinate system to align the 2 -axis with the 
instantaneous ray vector, then the curvature of the path in 
the X — z plane is 

d^x dx^ dx'^ 


(A12) 


_ pa: _ px Zpa: 

dz^ 

and similarly for d^y/dz^ and where now = (dg/dz)^. 
Erom gc,pdx°‘dx°‘ = 0 we find, working to second order, 

n^ = (dg/dz)^ = {g^^/ - gr,r,)il - gzr,) (A13) 

but since the Christoffel symbols are of first order in the 
potential 4 while gzri is smaller by a factor v then working 
to second order in 4 we can ignore the last factor and take 


n“ = —gzzig-qr) in (A12|. This yields 


d X ^ r X 


''gzz,x - g'^"’{2gz^,z - gzz,r,) 


(A14) 


^ (5 (2<7a;7),i7 gvV,P 4" 5 ^gvVtv)] ■ 


Clearly we can drop the term involving two off-diagonal el¬ 
ements as this is smaller than second order in as are the 
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terms involving ~ —Qxri and a time derivative of one of 
the diagonal elements since these are both ~ {(pv) x 4>/X, 
with A being overall path length. Dropping these, and using 
the first order approximation for the inverse metric element 
= gx7 as this multiplies a first order quantity, gives 

Qzz,x Qriri,x Qxr],r] 

Qzz 5^7777 5^7777 


dn 

dx 


9x7),ri 

2^7777 


(A15) 



The hrst term on the RHS is what we obtained in the 
previous section, and the second arises from matter motion. 
But this new term has no spatial derivative, so ends up pro¬ 
ducing very little effect. 

For linear perturbations, if we integrate the last expres¬ 
sion through a single structure of size L and potential fluc¬ 
tuation cj>, we get the x-component of the deflection angle 
dxjdz. The hrst term yields dxjdz ~ (f) while the veloc¬ 
ity V is on the order of tTj'V4> ~ X4>/L, since the age of 
the universe tu is on the order of the path length A, so 
f dz gnx,ri ~ Lgr]x/X ~ 0^. The effect of the off-diagonal 
terms is thus smaller by a factor L/A than the naive expec¬ 
tation from the fact that hxn is smaller than hxx by a factor 

~ V. 


For a non-linear structure such as a cluster, group or 
galaxy that is stable but has some bulk peculiar velocity v 
the partial derivative with respect to time will be, to order 
of magnitude, gxri,ri ~ v ■ Vg^rj ~ which, when in¬ 

tegrated through the object, gives a contribution to dxjdz 
that is again on the order of (j? since, for virialised systems, 
~ (j>. 


In both linear and non-linear regimes the light deflec¬ 
tion is smaller than the (twice) Newtonian value for a test 
particle moving with u = c by a factor ~ <j) (i.e. consider¬ 
ably smaller than one might perhaps have guessed from the 
relative size of the diagonal and off-diagonal terms in (A8|). 

The metric (A 81 is not the most general metric as it 
only has four spatial degrees of freedom. The missing in¬ 
gredient is the two degrees of freedom in the gravitational 
waves, but these are not effective for lensing (Kaiser & Jaffe 
1997). This is easily understood in the Fourier space ver¬ 
sion of Limber’s equation (Kaiser 1998) where, for lensing 
by scalar perturbations, the modes that are effective have 
wave-vector perpendicular to the line of sight so that the 
light ray stays in phase with the wave - much like a rapidly 
moving surfer surfing a slowly moving wave - so the de¬ 
flection builds up systematically. For gravitational waves, 
which propagate with |v| = c, this cannot happen. We con¬ 
clude from this that to an extremely good approximation we 
can ignore the additional effect on light deflection from the 
non-relativistic motions associated with structure (as well as 
gravitational radiation) and use the metric (A21, with only 
scalar Newtonian fluctuations. 

We note that owing to the non-linearity of Einstein’s 
equations the mean local curvature and stress-energy ten¬ 
sor implied by this fluctuating metric will not be the same 
as for an unperturbed cosmology with the same expansion 
rate etc. The Riemann curvature, for instance, contains a 
term that is quadratic in the connection and the latter con¬ 
tains derivatives of the metric so one would expect there to 
be a non-zero mean curvature involving e.g. the products of 
derivatives of 4> and this carries over into the Einstein ten¬ 
sor and hence the stress-energy tensor also. An alternative 
would be to adopt a model in which the stress-energy ten¬ 


sor is unperturbed in the mean. This simply requires adding 
an appropriate constant Laplacian to the metric perturba¬ 
tions (i.e. making the spatial sections globally curved). This, 
however, would not be appropriate in the context of infla¬ 
tionary fiuctuogenesis where the large-scale spatial flatness 
is a consequence of the assumed large initial value for the 
inflaton field and the slowness of its roll down the assumed 
potential, while the smaller scale fluctuations - that give rise 
to the structures we can actually observe - transition from 
Planck to horizon scale later and develop metric fluctuations 
that must be accommodated within a globally spatially fiat 
background. 


A2 Boundary conditions at the end of the ray 

We are interested in the integrated effect of lensing by struc¬ 
tures along the line of sight. So we can take the density 
perturbation on the actual photosphere to vanish and con¬ 
sider the observed temperature fluctuations generated by 
the combination of spatial variation of temperature, Doppler 
shift and gravitational redshifts to be a pattern that is 
‘painted on’. 

This point of view is valid even though the photosphere 

- or surface of last scattering - is not a real surface. It is de- 
Hned as the part of our 3D past light cone where the cosmic 
time is that of recombination tree- This decoupling time is 
set by atomic and cosmological parameters, which also set 
the acoustic scale of the structures that we can subsequently 
view in the CMB. If recombination occurs at a Hxed temper¬ 
ature, it may be wondered why there are any fluctuations in 
the CMB at all. One answer is that the effect of fluctuations 
is that the recombination temperature is reached at differ¬ 
ent times and hence different redshifts - which modifies the 
observed temperature. Thus the surface of last scattering 
is in reality a surface of constant temperature but varying 
redshift. Nevertheless, we can ask what temperature fluctu¬ 
ations would be observed if we were able to see a surface of 
constant redshift, and the answer is that the observed CMB 
would be the same. 

Ignoring the fluctuations at the end of the rays, the 
cosmic photosphere is perpendicular to the direction of the 
light rays. In the lumpy glass analogy, this is a surface of 
constant optical path length, J n d\, which differs from the 
physical path length at first order because of time delays 
(which in the cosmological context can be positive or neg¬ 
ative). We now ask how the area of the photosphere differs 
for rays propagating backward from some observer at some 
time, compared to the case of a universe - or a line of sight 

- that has no metric perturbations. This involves computing 
the area of the surface with 1st order path length perturba¬ 
tion 

AA = 2 J dXj). (A16) 

This first order time delay gives rise, as we shall see, to a 
2 nd order increase in the area of the photosphere through 
the ‘wrinkling’ effect. 

A surface of constant source redshift is not exactly 
the same as a surface of constant cosmic time because in¬ 
tervening perturbations, particularly those at low redshift, 
can cause perturbation to the observed CMB temperature 
Tobs via the integrated Sachs & Wolfe (1967) (ISW) or 
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Figure Al. Illustration of the first order change in the the confor¬ 
mal spatial path length AA to a surface of constant cosmic time 
(like the CMB photosphere). Coordinates are conformal back¬ 
ground position and time (A, 77 ). The hatched lines show an over¬ 
density where the coordinate velocity dX/drj is changed by a factor 
1 -|- 20 at linear order, so AA = 2 f dX (j) (which is negative for 
an over-dense path). But the surface of constant cosmic time 77 = 
constant is not a surface of constant redshift as it will be affected 
by the ISW effect caused by the change of the potential with 
time, which occurs at low redshift 2 < 1. A decaying over-density 
causes a negative perturbation to the redshift (i.e. a temperature 
enhancement AT/T = Arji/Arjf for the CMB). For a ray to reach 
the surface of constant redshift requires, for an over-dense path, 
an extra path to annul the ISW effect so the net path length per¬ 
turbation is reduced. The difference for high-redshift sources is on 
the order of a few percent even for low-redshift lenses. For lenses 
and sources at 2 < 1, the A A for constant redshift is reduced, 
as compared to the fictitious case where the potential does not 
decay, by about 50%. This approximately nulls the perturbation 
to the area at very low 2 , as described in the text. 


Rees Sciama (1968) effects, but do not affect Tem so 
1 z = Tem/robs ^ constant. Similar effects come from 
moving or dynamic lenses (Birkinshaw & Gull 1983). In 
the perturbative regime the ISW effect produces a temper¬ 
ature perturbation for the surfa ce o f constant cosmic time 
AT/T = Ar)i/Ar]f (see Figure All or ATjT — 2jdX(j)' 
where (j)' = d(t)ldr} (which becomes non-zero when the onset 
of dark energy domination damps the initial metric fluctu¬ 
ations). In the background, the temperature is decreasing 
as T oc 1/a, so to reach the surface of constant observed 


temperature (or redshift) requires an additional path length 
AA = A ?7 where Aa/a = a'Arj/a = —AT/T or, equiva¬ 
lently, AA = —2{a'/a)^^ f dX Consequently the net 1st 
order perturbation to the path length to constant redshift is 

AX^2 J d\4>x {1 + {<!>'/cfyxKa la)x^). (A17) 

This has a very small effect for high redshift sources since 
for these {a'/a)\^^ ^ {4>'/4>)\ regardless of A and, as we shall 
see, does not qualitatively change the outcome for any source 
redshift. 


A3 Distance reached vs. distance travelled 


Consider a ray that arrives at the observer by moving along 
the —z axis. For rays propagating close to and nearly parallel 
to the the z-axis we can set up 2D perpendicular comoving 
coordinates x and take Vj_ to also be the 2D derivative with 
respect to x at linear order. Again, we use A for distance 
along the ray. Assuming small displacements, the transverse 
velocity x = dx/dX (equal to the deflection angle) of this ray 
is, at first order in (f>, the integral of the geodesic equation: 

A 

x(A) dX' V_l<(>(A') (A18) 

0 

where the transverse displacement is 

A 

x(A) = -‘2j dX' (A - A')V_l</)(A') (A19) 

0 

and we have integrated by parts. 

We now use this to calculate the mean distance from the 
observer of the end of a ray of physical path length Aq. After 
propagating a partial path length A < Ao the end of the ray 
will lie at a direction from the observer that, to first order, 
is n = z + x/A . The amount by which the instantaneous 
ray vector z + x differs from this direction is 


Ax(A) = x(A) — 


‘/—I/ 


dX' A'Vn 


(A20) 


In propagating a further path length 5X the end of 
the ray will advance a distance measured from the observer 
5r = 5A X (1 — |Axp/2) + .... The distance reached after 
propagating a path length Ao is therefore r(Ao) = Ao + Ar 
where 


Ar = —- 


= -2 


''0 

J dX |Ax(A)|'' 
0 

Aq a 


= -^JdX (Ao-A)Vx 


A 

‘•7 

0 

A 

/ 


dX” A"Vx((." (A21) 


dX’ A'Vi 


where cj>' = 0(A') etc. and where, in passing to the final line, 
we have integrated by parts and used 


•'o 


dX" 


.. = 2 


AQ A 

/"/ 


dX" 


(A22) 
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which is valid if the integrand [...] is a symmetric function of 
its arguments. Regarding notation, here and in what follows, 
all gradient operators act only on the function of position 
that follows them and (j)' is shorthand for etc.. 

Note that we only required the first order deflection 
here. Higher order corrections to (A18l & (A19l are irrele¬ 
vant. 

Evidently the perturbation to the distance reached in 
propagating a fixed physical path length Aq is a quantity 
that is of second order in the potential or refractive index 
fluctuations. It follows from this that the surface of constant 
physical path length from the observer has no 1 st order tilt; 
its outward normal is, up to first order, parallel to the local 
direction from the observer. As discussed in the ray-to- 
ray variations in distance reached are expected to be small 
compared to the systematic offset. 


A4 Mean distance for constant physical path 

We now express the ensemble mean distance for a constant 
physical path length in terms of the auto-correlation func¬ 
tion of the potential The model we shall adopt is that, at 
least locally, the potential is a statistically homogeneous and 
isotropic random field. The quantities that we will calculate 
are of second order in the potential and so may be obtained 
in terms of without any further assumptions about higher 
order statistics (i.e. we do not need to invoke Gaussianity, 
so the results are applicable for non-linear density fluctua¬ 
tions). By ‘local’ above, we are allowing for the possibility 
that the potential fluctuations may be statistically homo¬ 
geneous at any instant of cosmic time but may depend on 
look-back time. If, as is the case in conventional models, 
the effects of interest here are dominated by fluctuations 
which are much smaller than the Hubble scale it should be 
a good approximation to calculate effects by summing the 
effect from different shells within which strict homogeneity 
is assumed to obtain. 

Writing 0 as a Fourier synthesis 

♦W'/lSs =/(II 

and invoking local statistical homogeneity, (k')) = 

(27r)^5(k — k')P 0 (|k|), the required ensemble average of the 
transverse gradients at two points is 

■ VL0(r -f r')) 

= / • kxP4|k|)e* -^' (A24) 


where the auto-correlation function of the potential is 

C 0 (r') = ( 0 (r) 0 (r d- r')) = J ^^^P 0 (|k|)e‘‘‘"' . (A25) 


The quantity that appears in the expression (A21l for 


Ar above, when averaged, is the two-point function of the 
transverse gradients of the potential at two points with 
separation parallel to the path. Taking the potential auto¬ 
correlation function to be locally isotropic, ^,^(r) = ^</,(r), 
and using the standard expression for an isotropic function. 


= C + 2 ^V^i ’"'6 have 

(Vx</.(A) ■ Vx</.(A')) = = zy) = -2^'^iy)/y 

(A26) 

where y = X' - X and ^'^{y) = d^^{y)/dy. 

It follows that the perturbation to the mean distance 
reached after propagating a fixed physical path length Xg is 


AQ A 

(Ar) = -^JdX (Ao -X)J dX' A'(Vx(). • Vx<?i') 

0 0 

Ao r 0 O' 

= ^ J dX{Xo- X) J dy^'^ + X J dy ^'^/y 


Ao 


Ao r 0 ■ 

j dA(Ao-A) ^40) + A j dyU^/y 


(A27) 


In the last step, we are invoking the idea that the range 
of correlations of the potential is limited, so for any A sub¬ 
stantially greater than the correlation length, the integrals 
will have converged so we can take the lower limit in the 
integration over separation y to be minus infinity. 

Finally, in the same spirit, the second term in paren¬ 
theses [...] will be much greater than the first. For example, 
if one were to consider a simple model of ‘blobs’ of some 
characteristic size L and randomly chosen potential with 
root mean squared value 4> one will have C</>( 0 ) ~ and 
X J dy ^'^/y ^ {X/L)cj}^ ^ cj)^. Dropping the smaller term 
then gives 


-'o 

(r) = A - dA A(Ao - A) J(A) 


where we have defined 


J(A) = -8 


u 

J dyi'^{y, 


A)/y ■ 


(A28) 


(A29) 


The minus sign makes J(A) a positive quantity, and the 
notation ^^{y;X) is meant to indicate that the two-point 
function has a strong dependence on separation y but may 
also have a weaker secular trend with conformal look-back 
time A. 

Equation |A28| gives the ensemble mean of the distance 
reached at second order in the metric perturbation (p and is 
also obtained assuming a coherence length L <C A, so higher 
order corrections to this are smaller by at least one power 
of L/X. 


A4-1 Rate of increase of deflection variance 

The potential here is dimensionless, so J has units of in¬ 
verse length. In the ‘random blobs’ model J ~ !L while 

the deflection angle for a ray passing through a single blob 
is A 01 ~ (j>. Eor random blobs the cumulative deflection 
performs a random walk, and J is the rate at which the 
cumulative deflection squared grows with path length. This 
can be made more precise: From the definition (A18l of the 
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deflection angle x it follows that 


d\M'^ 

dX 


A 

= 8Vl0(A) • j d\' Vx(^(A'). (A30) 


Taking the ensemble average using (A26l gives 
0 


rf(ixr) 

dX 


= -16 / dy 


/' 




y 


~ -16 


0 

/ 


dy = 2 J 

y 


(A31) 

where the approximation is good for any distance from the 
observer much larger than the assumed small correlation 
length. Thus J is the rate of increase with path length of 
the mean squared deflection (per component). 

The above formulae also provide a useful way to express 
J in terms of the power spectrum of the potential fluctua¬ 
tions rather than in terms of the two-point correlation func¬ 
tion. With 


J = - 


U 

4 J dyV\i^(y) 


A 

= lim 4{yL(()(A) • [ dX' V±(l>iX')) 

A-»oo J 


(A32) 


and expressing the potentials here in terms of their Fourier 
components we find 


J = lim 4 1 
\-KX J 


= lim 4 1 
\-KX J 


= lim 4 i 
\-too J 

^ dink fc^A^(fc) 


dy e 


— ikzV 


X 

^ 2 


f, 2.sinfj.kX 

dy.{l- fi )- 


ykX 


(A33) 


where we have defined the contribution to the poten¬ 
tial variance per log-interval of wave-number as A^{k) = 
k^P^{k)/2'P. For A —>■ oo, and for finite k, the ‘sine’ function 
here has a narrow central lobe with of width Sy. = l/(fcA) 

1 and the integral has very little contribution from the os¬ 
cillating wings, so we can approximate the factor 1 — y? \yy 
unity and change the integration variable to obtain 


J = 2 I dink feA; 


OO 

.mj 


dy 


smy 


= 2n J dink kA^{k) 

The integrand here was plotted in Figure]^ 


(A34) 


A4-2 Perturbation to the area of constant distance 
travelled 

The average (r) in ( |A28[ | is the ensemble average for the 
distance reached by a ray fired off in a fixed direction from 


an observer (i.e. we are averaging over an ensemble of real¬ 
isations of the potential field). What we are primarily in¬ 
terested in here is the ensemble average of the area per 
unit solid angle {dA/dQ,} or, dividing by the constant un¬ 
perturbed distance squared, we wish to determine (dA/dAo) 
where dAo = XodO,. 

The area dA that is the intersection of the bundle with 
the surface A = Ao lies at a distance r = Ao-I-Ar from the ob¬ 
server and, as we have discussed, has a normal with no first 
order deviation from the direction away from the observer. 
Writing dA = r^dO! - i.e. defining dO! to be the solid angle 
that this area would subtend at this distance if there were 
no light deflection - we have dA/dAo = {dO!/dQ,){r'^/X^). 

Since the perturbation to r is already second order, the 
ensemble average of dAjdAc^ accurate to second order is 


Now the factor d^' jd^ here is, at linear order in the 
potential, just 1 — 2k with k the convergence. This has an 
(ensemble) expectation average that vanishes at first order. 
But we are working to second order precision here and one 
might imagine that there would be a significant second order 
contribution to {dfl'/dfl). 

But in fact - and this is critical in what follows - 
{dQ'/dQ) is precisely unity. This is because the process gen¬ 
erating realisations of the potential field is symmetric with 
respect to the observer; there is no preferred direction. So an 
equivalent to generating realisations of (j){r) and averaging 
quantities for a single direction from the observer is to gen¬ 
erate realisations and then, for each of these, average over 
all directions from the observer. But in doing so it is guar¬ 
anteed that, in the absence of multiple imaging, the sum 
of dQ,' will be Att since there is a one-to-one mapping with 
lensing simply rearranging the sky without duplication and 
without missing any regions. Thus {did'/dO.) = 1 and the 
desired expectation is 


/ dA 

\ dAo 


1 + 2 


dfl' Ar \ 
dil Ao / 


1 + 2 



(A36) 


where . .. denotes terms of higher than second order in the 
potential. 

This then yields the fractional perturbation to the area 
of the sphere of constant physical path length Aq: 


0 

This is on the order of ~ Ao J (for constant J it is —Ao J/3) 
or roughly equal to the mean square deflection angle, though 
the presence of the factor A(Ao — A) means that lenses close 
to the observer or the source have relatively small effect on 
the distance. 


A5 The area of the CMB photosphere 

We have calculated above how the wiggling of rays decreases 
the area of the surface of constant path length as compared 
to its value in an unperturbed universe or uniform refractive 
medium. As we have discussed, the CMB photosphere is not 
a surface of constant physical path length from the observer; 
it is the surface of constant optical path length. To first order 
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in the potential the photosphere is the surface of conformal 
path length A = Ao + 2 J dXcj). 

To calculate the ensemble average of dA' jdAo, where 
dA' is the intersection of the bundle of rays with the pho¬ 
tosphere, we proceed as follows: The area dA at A = Aq 
considered above is not perpendicular to the ray direction; 
the corresponding area perpendicular to the ray at A = Ao 
is dA X (1 — I Axp/2) (correct to second order). The area of 
the intersection of the photosphere with the bundle (which 
is perpendicular to the beam direction) is given by that per¬ 
pendicular area times 1 -I- 26'AA, where 9 = A/2A is the 
expansion rate of the bundle. To zeroth order 9 = 1/A, but 
as AA = 2 J dA is first order in the potential we need 
to consider the first order perturbation to the expansion 
rate A9 = / d\ A^Vl^c/, as shown in Appendix 

And as AA multiplies the zeroth order expansion we need 
to compute this to second order. This is done by writing the 
potential along the perturbed path as a Taylor expansion 
about the unperturbed path with lowest order correction 
A</ = Ax-Vx</>-fAAa</)/aAwith Ax = -2/dA' (A-A')Vx</> 
and with AA = 2 J dA' (/)(A'). This gives for AA correct to 
second order 


AA 



A')Vx</>' 

(A38) 


But on evaluating the ensemble average of the 2nd or¬ 
der terms here - making the usual assumption in the first 
that the range of correlations is small compared to Aq and 
integrating the last one by parts - one finds that these both 
involve ^,^(0), and give contribution to AA/Ao only on the 
order of (j>^ in the blob model. They are therefore like the 
first term in the [...] in the last line of (A271 and in the 
same way we ignore such sub-dominant contributions. The 
upshot is that we can just use the first order expression for 
AA. 


Multiplying these factors, the ratio of the area of the 
intersection of the bundle with the photosphere to the un¬ 
perturbed area is then, at second order. 


dA' _ dO.' ( 2Ar\ / _ |Axp\ 
dAo dnV'*'r/V 2 J (A39) 
X (l-b2(l/A-b A6i)AA). 


Taking the ensemble expectation value we will obtain 
four second order contributions : The first, 2(Ar/r) we have 
already calculated. The second is 


(|Axp) 

2 


-^0 -^0 
^ J d\ \ J a 


dA'A'(V x9 !>-Vx</>'). (A40) 


We can evaluate this, in the limit that the correlation length 
is much less than the path length, much as we did in the 
calculation of (Ar). The leading order term is obtained by 
replacing A' in the second integral by A and taking it outside 
of the integral. Unlike the expression for (Ar) this involves 
dA dA'... rather than dA dA'... so we end up 
with the complete integral f dy i'^{y)/y from —oo to oo. The 


result is 



(A41) 


where we have replaced As by Ao since the difference intro¬ 
duces only higher order corrections. 


The sum of (A37l and (A41i gives the mean of the 


perturbation to the area perpendicular to a beam that has 
propagated a path length Aq. We have calculated this us¬ 
ing the optical scalar equations in appendix [P] for the case 
of constant J. The result is {AA)/Ao = —(2/3)AoJ which 
agrees with what we find here. 

Next there is the cross term 


Aq aq 

2 (A6IAA) = - dXX^ J dA' (</'Vx^(/>) 


0 

•^0 


(A42) 




J dA A^J(A). 


This is just twice {|Ax|^)/2. Including this we find, for the 
case that J is non-evolving, that the sum of the effects so 
far vanishes. 

Finally we have the cross-term from dS2'/dS2 = 1 — 2k 
and the first order time-delay term 2AA/A. This is 





Aq Aq 

J dX A(Ao - A) y dA' (</>'Vx^0) 


0 
Aq 

^ y dA A(Ao - A) J(A). 


(A43) 


This is just (minus) twice 2{Ar/r}. 

Combining (A37l, ( A41| |, (A42| & (A43l we obtain the 
final result 


•^0 

(AA)/Ao = ^ y dA (2A(Ao - X) + X^)J{X). (A44) 

This result is of second order in the metric fluctuations and 
is valid at leading order in the assumed small parameter 
L/X. For constant J this is {AA)/Ao — +{2/3)XoJ. 

We can see from this that the fractional change in area 
of the photosphere depends only on J; that it is non-zero; 
and that it is generally positive - so the effect of surface 
wrinkling wins out over the competing effect of paths wig¬ 
gling. But, as anticipated in the order-of-magnitude argu¬ 
ment presented in the Introduction, it is extremely small 
being only on the order of the cumulative deflection angle 
squared. 


A6 The area of surfaces of constant redshift 


As discussed in §A2| a surface of constant redshift differs 
from a surface of constant cosmic time in that the 1st or¬ 
der path length perturbation AA that appears in (A42 I and 
(A43l is given by (A17l rather than (A16l and that this 
surface is not perpendicular to the ray direction. 

The angle © between the surface normal and the ray 
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direction is just the transverse gradient, at the end of the 
ray, of the differential time delay in (A17l associated with 
the ISW effect: 


© = 


{a'/a)xo 




Ao 


(A45) 


which has mean squared expectation value 


(00 = 


^oia'/a)% 




A 


(A46) 


The upshot of this is that the fractional perturbation to 
the area of a surface of constant redshift is given by an inte¬ 


gral along the line of sight identical to (A44), but including 
a factor 1-|-2 ((/)'/</>)a/(oV®)ao in the integrand, plus (0^)/2. 
This results in a substantial reduction in the perturbation 
to the area, as compared to that for a surface of constant 
cosmic time, for sources at low redshift. But the conclusion 
that the effect is of the same order of magnitude as the mean 
squared deflection is unaltered. 

All the above calculations have concentrated on the 
surface around the observer where sources have redshift 2 . 
But one could also consider the surface surrounding a single 
source, on which all observers see the source to have red¬ 
shift z. It might be expected that the properties of these 
surfaces would be equivalent, but this is not so. Consider 


equation (A44l: with A the distance from the observer, it 


provides the ensemble average of the source-surface area per 
unit solid angle at the observer. But with A interpreted as 
the distance from the source it gives the ensemble average of 
the observer-surface area per unit solid angle at the source 
and these are not the same, since ( |A44[ ) is not symmetric 
under A —>■ Ao — A. 

Why this should be so may be understood in the hy¬ 
pothetical situation where the lenses only develop very re¬ 
cently. In that case the observer experiences very little per¬ 
turbation to the source-surface area as both the wiggling 
and wrinkling effects are suppressed (as compared to simi¬ 
lar lensing structures situated roughly mid-way between the 
sources and the observer). The surface of a pulse of radia¬ 
tion from a source, on the other hand, passes through a shell 
of inhomogeneity just before it reaches the surface contain¬ 
ing the observers who see it to have redshift 2 . This induces 
little distance-reached perturbation, but does cause the sur¬ 
face to be wrinkled, thus increasing the area and thereby 
decreasing the mean flux density. 

So the average of flux densities of the sources at red¬ 
shift 2 seen by one observer is, in the limit that the struc¬ 
ture appeared very recently, exactly unperturbed. But the 
flux densities averaged over an ensemble of sources and the 
observers who see those sources to have redshift 2 is biased. 
This may sound paradoxical, but is not. The locations of the 
observers in the latter case is not random; where they lie is 
correlated with the location of the source and the potential 
fluctuations. 

The distinction is of course largely academic since the 
effect is so small. But we would argue that what is relevant 
observationally is the average over sources for one observer 
(us) rather than the average over an ensemble of extra¬ 
terrestrial observers. 


APPENDIX B: THE RATE OF EXPANSION OF 
A BUNDLE OF RAYS 


This appendix provides the first order expansion of a bundle 
of rays that was used in the previous appendix. 

Consider a narrow cone of rays that leave the observer, 
propagating backwards in time, with central ray initially 
along the 2 -axis, and label these rays by their initial di¬ 
rection 0. After propagating a distance A from the ob¬ 
server through a refractive medium with refractive index 
n(r) = 1 — the transverse displacement of the ray 

with initial direction ©, relative to the location of the cen¬ 
tral ray, will be, to first order in the potential 


Ax = © 


A 


AI-2 


J d\' A'(A 

0 


A')VxVx<()(A') 


(Bl) 


The transformation from solid angle to areas perpendic¬ 
ular to the central beam is the Jacobian: A = |ciAx/d©|fl, 
so the area of the beam bundle is proportional to the de¬ 
terminant of the matrix [.. .] above which, working to linear 
order, says 


A(A) = 



But this can also be expressed as 


A')Vx"<).(A') 


(B2) 


A(A) = X^Q. 



(B3) 


where what we shall call the ‘linearised perturbation to the 
rate of expansion’ is 


A6i(A) 


A 

I dA'A'Vx"^(A')). 

0 


(B4) 


The equivalence of (B2 I and (B31 being easily established 
by integrating the double integral obtained by substituting 
(B4 1 in (B31 by parts. 

The meaning for the terminology is that if we define the 
‘expansion rate’ for the beam as 6 = A12A with A = dA/dX 
(analogous to the Hubble expansion rate) then from (B31 at 
linear order 9 = 9o + A9 -|- ... with zeroth order expansion 
rate 9o = 1/A. 


In CUMD14, the expansion is defined as —din A/dX 
which is minus twice our definition. In their appendix D they 
find that the perturbation to the area of the constant -2 sur¬ 
face is given by (2 f dX A9)^ (in our terminology) or equiv¬ 
alently {AA/A) — 4(k^). We have already reached a very 
different conclusion, which we confirm below in using an 
independent approach that is closer to that of CUMD14. 


APPENDIX C: THE GEODESIC DEVIATION 
APPROACH 


The mean (inverse) magnification calculated by MS97 is 
qualitatively similar to our result (311 but differs in detail 
and predicts much stronger mean inverse amplification for 
very nearby lenses. To try to resolve this discrepancy, we first 
cast the MS97 analysis in the notation used here, where e.g. 
we work with the spatial auto-correlation function of the 
potential rather than the power spectrum. 
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Cl Metcalf & Silk’s analysis 

As above, we consider rays close to a guiding ray that 
propagates initially along the 2 :-axis (here we will use 3- 
dimensional comoving coordinates r = {a;i,a; 2 ,A} with 
|r| = x) then the geodesic equation for the transverse dis¬ 
placement of the guiding ray is xq = —2Vx((). Integrating 
the geodesic equation gives the transverse velocity of a ray 
with initial direction n = {0i, 02,1} at A = 0 


which is |D| but, as discussed earlier with D = I-|-Si-|- 
S 2 -f ... where the subscripts ‘1’ and ‘2’ denote the first and 
second order (in (j)) terms appearing in (C41. The inverse 
amplification is = |Dj = 1-1- Tr(S) -I- |S| = 1 -|- Tr(Si -f 
S 2 ) -|- |Si|, plus terms of cubic and higher order in (j). 

It would seem that the expectation value of the trace of 
Si vanishes as it is a first order quantity. And the expecta¬ 
tion value of I Si I is 


k(A) = 0 - 2 / dA' V 




(Cl) 


-^0 

|Si|) = A J dX (Ao-A)A 
^ 0 


and integrating once more by parts gives a displacement 

A 

x(A) = ©A - 2 dA'(A - A') (C2) 


•^0 

J dA' (Aq — \')X' {4>ll(j}22 ~ 4>124’2 i)- 


(C5) 


0 

The integration is taken along the path, which to obtain 
x(A) to 2nd order in the potential can be taken to be the first 
order perturbed path, i.e. (j) must be evaluated at r = Az-l-x. 
The location of the end of the ray after propagating a path 
length Ao is therefore 


But it is easy to show that, for a statistically homogeneous 
potential this vanishes as, in Fourier space, the derivatives 
become multiplication by the transverse components of k. 

Expressing the correlation of third and first derivatives 
appearing in S 2 in terms of the power spectrum also shows 
that 


r(Ao) = (z + 0)Ao - 2y dA (Ao - A)Vl<(>(^(z -f ©)A 
0 

A 

-2 j d\' {X- A')yL<?!)((z -I- ©)A')^ 

0 

= (z + ©)Ao — 2 f dX (Ao — A)Vl(J 


Aq a 

-f 4 y dA (Ao - A) y dA' (A - A')Vx Vl 0 • Vx.^^' 


(C3) 


where, in the last expression, all of the potentials are to be 
understood as being evaluated along the undeflected path 
with initial direction z + &. 

Differentiating with respect to the assumed infinitesimal 
© gives the distortion tensor (the derivative of 2-D deflection 
X = (1 — zz')r): 


(Vx Vx Vx<(>(r) • Vx<(>(r-h r')) 

= -(VxVx<(>(r) • Vx Vx<(-(r -h r')) 
= -(<(.(r)VxVxVx"<(>(r-f r')) 

= -VxVxVx"C.A(r') . 


(C6) 


We can now see that, when we take the expectation value of 
the final line in (C4|, there will be almost complete cancel¬ 


lation if the range of correlations is limited (since for corre¬ 
lated pairs of points A ~ A'). 

We also see in ( |C4[ | that there are two ‘post-Born’ ef¬ 
fects. One comes from the beam being displaced laterally. 
The other comes from the change in the area of the beam. 
But from ( |C6[ ) these are almost exactly the same but of 
opposite sign so the net effect is strongly suppressed. 

The trace of the mean distortion is therefore 


Aq 0 

(Tr(S 2 )) = - A y dA (Ao - A) y dt/ j/Vx'^^ny) (C7) 


-^0 

0 

4 r (C4) 

-f — / dA (Ao - A) / dA' (A - A') 

0 0 

X (AVxVx Vx<(> • Vx<(>' 4- A'VxVx-^i • Vx Vx<(>') 

as obtained by MS97 and where the potential is now evalu¬ 
ated along the 2 ;-axis. 

They then proceeded to write as a Fourier synthesis 
to obtain the mean of the trace of the distortion tensor in 
terms of the power spectrum (the mean of the first order 
term on the first line here assumed to be vanishing). In doing 
so they take Vx to be the derivatives with respect to the 
transverse Cartesian coordinates x = {xi,X 2 }- 

Here what we actually want is the inverse magnification. 


with n a unit vector along the line of sight and where we 
have changed the second integration variable from X' to y — 
A' — A. The Laplacian here is with respect to the transverse 
coordinates. For a spherically symmetric function F{r) = 
F{r) we have Vx^F’(r) = 2F'jr, where F'{r) = dF{r)/dr, 
and Vx^(Vx^F’(r)) = 8{F'/ry/r so (Tr(S 2 )) can also be 
expressed as 

-^0 0 

(Tr(S 2 )) = - ^ y dA (Ao - A) y dy (C8) 

0 -A 

And as above, if we assume that the potential fluctuations 
have a limited range of correlations, and as we are consid¬ 
ering sources at great distances many times the correlation 
length, the mean inverse magnification will be well approxi¬ 
mated by taking the lower limit on the y-integral to be — 00 , 
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with the result 

{1/M)n = 1 + {Tr(S2)) 


■AQ U 

= 1 + ^ y" dA (Ao - A) J dy C^/y 

0 —oo 

■^0 

J dX (Ao - A) J . 


(C9) 


= 1 - 


4 

Ao 


This is equivalent to MS97’s equations 8 & 9 but in a pos¬ 
sibly slightly more transparent form. 

Their analysis is very elegant, and seems straightfor¬ 
ward in principle. And their result is, at least qualitatively 
very similar to (311 in that the inverse magnification is a 
weighted line integral of the rate of increase of the mean 
squared deflection J and is therefore clearly on the order of 
the mean squared cumulative deflection which we know to 
be tiny. But on closer inspection the formulae differ in the 
details of the weighting. In particular ( |C9[) gives much larger 
effect than (311 for nearby lenses; our (|31[) is relatively sup¬ 
pressed for nearby lenses at distance Ad by a factor ~ Ad/Ao. 
For nearby lenses, (|C9|| starts growing behind the deflection 


region but saturates at a constant value that is independent 
of Ao, whereas ( |31[ ) predicts an effect that decays asymptot¬ 
ically as ~ 1/Ao for large A. 

But in the intuitive picture that the key ingredients are 
the change in the distance reached because rays are wiggly 
and the angular deflection at the end causing the surface 
to be aspherical, it seems inevitable that the effect of struc¬ 
tures close to the observer should be suppressed, at least 
qualitatively, as in ( |31[ ). 

There is also something rather strange about the MS97 
result for the location of the end point of the central ray 
- the first two lines of ( |C3[ ). If we write this as r(Ao) = 
ro + ri + r 2 where the subscripts denote the order then the 
squared distance reached is |r(Ao)P = ro•ro-|-2ro• (i"! +r 2 ) + 
ri-ri-f. ... But both ri and r 2 are perpendicular to ro = zAo 
so, up to 2nd order, |r|^ = |ro|^ -I- |ri|^ = Aq -I- |rip; i.e. the 
distance reached is always greater than Ao whereas we would 
have expected the distance reached to have a negative second 
order perturbation because of the wiggliness of the rays. 


C2 A partial resolution 

The last puzzle, at least, has a simple resolution. The gra¬ 
dient operators Vl in the last section were taken to be 
the derivative with respect to x. But what appears in the 
geodesic equation are the gradients in the direction perpen¬ 
dicular to the instantaneous ray directions. As the rays have 
a first order deflection, this gradient is not perpendicular to 
the 2 -axis, so when applied to 0 there will be a second order 
correction. 

In general Vx = V — n(n' V) with n the ray direction. 
If we consider a ray that arrives at the observer with n — z 
then the deviation of the direction after propagating some 
path length will, to first order, be n = z -|- x, so 

Vx = V - (z -f x)(z + x) ■ V 

= Vx-k;(z-V)-z(x-V)-f ... (CIO) 

= Vx - x9a - z(x ■ Vx) + ... 

where Vx = V — z(z • V) is the 2D gradient with respect 


to the Cartesian transverse coordinates x = {xi,X 2 }, and 
where we have used z • V = d/dz = 9a + ... to first order. 
These are only correct to first order, but as they get applied 
to the potential that is all that we need. In the second or¬ 


der terms in (C31 and (C41 we can ignore the distinction 


between Vx and Vx. But working to 2 nd order precision we 
need to keep track of the correction to the first order terms. 

In terms of Cartesian coordinate derivatives, the 
geodesic equation is 

f =-2(Vx - kSA - z(x • Vx))<)> (Cll) 

with integral, for initial direction f( 0 ) = z 

A 

f- = z - 2 J dA'(Vx - x9a - z(x ■ Vx))?!-' (C 12 ) 

0 

so the transverse ‘velocity’ x = r — z(z • r) is 

A 

X = -2^ dA'(Vx - x9a)?!>' (C13) 

0 


which in (C12l, and keeping only terms up to 2nd order in 
(j), gives 

A 


r = z — 2 


j dA'i Vx<(>' 

0 ^ 

a' 

+ 2(9A<)>'-bzVx)<)-'-) J dX''':7^4>'^ 


(CM) 


with integral 


^0 / 

(Ao) = zAo — 2 J d\ (Ao ~ A)< Vx 0 

0 ^ 

A 

+ 2{dxcp + zV^(j}-) j dA'Vxfli' 


(CIS) 


where all of the potentials are to be understood as being 
evaluated along the actual perturbed path. In order to com¬ 
pute expectation values we need to work in terms of the 
potential along the unperturbed path r = Az which is ob¬ 
tained by making a Taylor expansion of the first order term 
above (the correction to the second order term being of cubic 
order). The result is 

-'o 


-'O f 

r(Ao) = zAo — 2 J dX {Xo — X)l Vx?!> 

0 ^ 

A 

.Vx<(. j dX' (A - A')Vx<(.' 


-2Vx 


(C16) 


A 

+ 2{dxcp + zV^(j)-) J dA'Vxfli' 


where now all the potentials are to be evaluated on the un¬ 
perturbed path. 


If we compare with (C31 - specialising to the case © = 0 


as we are assuming here - we see that the terms on the last 
line are new and, in particular, there is now a 2 nd order 
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component of the displacement of the end of the ray parallel 
to the a-axis: 

Aq a 


Sr 2 = —4z J dX (Ao — A)Vx 


j d\' Vx<^' 


(C17) 


0 0 
which has a non-vanishing dot product with the unperturbed 
direction, so the squared distance reached is 

A^O A 

I |2 


Ao - 8Ao j dX (Ao - A) Vx<(> • J dX' Vx<(>' 

0 0 


-^0 


•^0 


(C18) 


+ 4 y dA (Ao - A)Vx<(> • J dX! (Ao - A') Vxfli' 
0 0 
with expectation value 

-^0 

{|r|") = Ag -2 J dXX{Xo-X)J 


(C19) 


which agrees with (A28 1 . Note that this contains the lensing 
kernel, so nearby lenses do not contribute. 

Resolving the difference between the inverse amplihca- 
tion of MS97 and that obtained here is much more compli¬ 
cated. What one has to do is develop the 2nd order expres¬ 
sion for the end-point of a ray with direction at the observer 
z + © and then differentiate with respect to ©. We shall not 
pursue that analysis here. 


APPENDIX D: OPTICAL SCALARS AND THE 
FOCUSING THEOREM 

Here we consider the mean inverse magnification from the 
perspective of optical scalars - the rates of expansion, shear 
and possibly rotation of a bundle of light rays that appear 
in Raychaudhuri’s equation. This formalism was originally 
developed by Sachs (1961) in the context of propagation of 
gravitational radiation, but it applies for any massless held 
in the geometric optics limit. The optical scalar transport 
equations (see Schneider, Ehlers & Falco 1992, Narlikar 2010 
for derivations) are particularly important in the present 
context since, as we have discussed, they are the basis for the 
‘focusing theorem’ (Seitz, Schneider & Ehlers 1994), which 
appears to show that inhomogeneities cause systematic fo¬ 
cusing of beams of light, and which underlies the claims of 
Clarkson et al. 2012 and CUMD14. Our goals here are to 
provide a check on the analysis in the main text; to show 
that there is no subtle relativistic effect hidden in these equa¬ 
tions; and to elucidate the meaning of the focusing equation. 

We hrst develop the optical scalar transport equations 
in the form appropriate for calculating distances and beam 
areas given some statistical prescription for the metric fluc¬ 
tuations as a function of background coordinates. We then 
solve these perturbatively, up to second order in the ampli¬ 
tude of the metric fluctuations and compare with the results 
obtained in the main text. 


D1 The optical scalar equations in the weak field 
limit 

As discussed in §A1[ light rays propagating through a per¬ 
turbed FRW background with statistically isotropic metric 


fluctuations are exactly equivalent to optics in a medium 
with refractive index n(r) and obey 

r = Vih (Dl) 


where h = Inn and Vx = V — r(r • V) is the derivative 
in th e direction perpendicular to f. In terms of the metric 
(All n = [(1 — 2^(r)/c^)/(l -|- 2(f){r )/with r being 
conformal background coordinates, and dot being derivative 
with respect to path length in these coordinates so |f| = 1. 

The optical scalar equations are a set of coupled non¬ 
linear differential equations that describe the evolution of 
the rate of expansion, the vorticity and the rate of shear of 
a bundle of rays (here we are interested here in a bundle of 
rays that left the observer, propagating backward in time, 
within a circular cone of infinitesimal solid angle dO). These 
equations are of interest here because the rate of expansion 
can be integrated to give the area of the beam. 

At some point A along the central (or ‘guiding’) ray 
(which we denote by subscript 0), and as illustrated in Fig¬ 
ure |DH we can erect background spatial coordinates such 
that the 2 -axis points along the direction of the central 
ray, i.e. fo = z, and define the 2-D orthogonal coordinates 
X = {xi, X 2 } on the plane orthogonal to be x = r — z(z • r). 
We set the origin of coordinates at the location of the central 


ray: xq = 0 . 

Now consider a collection of neighbouring rays whose 
directions f vary smoothly on the surface perpendicular to 
the central ray, so for inhnitesimal displacements x they have 
orthogonal ‘velocity’ x = r — z(z-r) = K- x where K is a 
2 x 2 matrix that we shall refer to as the ‘optical tensor’, and 
which is the derivative of the orthogonal ray velocity with 
respect to the orthogonal coordinates. Our first goal is to 
obtain a first order differential equation for how K changes 
with path length along the beam. 


Dl.l The optical tensor transport equation 

At linear order in x the ray directions are r = z + x and the 
perpendicular gradient operator is 

Vx = Vx - - z(x ■ Vx). (D 2 ) 

Let us now use this in the geodesic equation to propagate 
the guiding ray forward by a path length corresponding to a 
given interval of optical path (or phase A 0 for a monochro¬ 
matic source): AAo = A<()/n( 0 ). To first order in AAo the 
new position, which we denote by a prime, is 

ro = 0 -b roAAo = zAAq (D3) 

while the ray direction will be 

i-Q = z-f roAAo = z-b Vxh(0)AAo. (D4) 

As the direction has changed we have a new plane perpen¬ 
dicular to the guiding ray, or equivalently tangent to the new 
wavefront, that is tilted with respect to the plane 2 = AAq. 
The equation of this plane is 

z — AAo = h(x) = —AAoVxh(O) • x. (D5) 

Now consider a neighbouring ray that pierces the sur¬ 
face 2 = 0 at X and propagate this to the new tangent 
plane. To first order in AAo and x this requires a path length 
AA = AAo(l — X ■ Vxfi(x)) - this can also be obtained from 
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Figure Dl. Illustration of a bundle of rays (thin curves) and 
associated wave-fronts (thick curves) and ray direction vectors 
r = dr/dX (arrows). The base of each arrow is labelled by distance 
(physical for lumpy glass, background conformal for perturbed 
FRW) along the path. Close to the guiding ray the ray vectors 
will vary linearly with transverse displacement. The optical tensor 
K is the derivative of the ray direction with respect to coordinates 
X on the plane that is tangent to the wavefront at the location 
of the guiding ray. The optical tensor transport equation tells us 
how K evolves as the bundle propagates through any metric or 
refractive index fluctuations. Since rays are perpendicular to the 
wave-fronts, the transverse components of the direction of rays are 
the 2D gradient of the wave-front displacement from the tangent 
plane. It follows that the optical tensor is also the Hessian (2nd 
spatial derivative) matrix for this displacement. 


AA = A(()/n(x). The advanced position and direction will 
be 

= X + rAA = x -I- (z -I- x) AA (D6) 

r' = z -t- X -I- (Vx — xSz — z(x • Vx))hAA (D7) 

One path forward at this point would be to apply ro¬ 
tations into the local coordinate system defined by the new 
tangent plane to obtain the difference in direction between 
this ray and the guiding ray x” = — J?(fo) = R{r' — Yq). 

This will be a linear function of the rotated displacement 
x" = R{y' — r'o) with tensorial coefficient K" such that 
x" — K" • x". The rate of change with path length of K 
then being K = (K" — K)/AAo. 

But this rotation is an unnecessary complication since 
both of the vectors r' — Yq and f' — Yq are almost perpendic¬ 
ular to the original (unrotated) 2 -axis, so they only change 
quadratically with the angle. And the angle is first order in 
AAo- So the vectors x" and x" can be obtained at first or¬ 
der in AAo simply by projecting r' and f' and y'q onto the 
original 2 = 0 surface to obtain x' = r' — z(z ■ r') and so on. 

The transported transverse position and velocity are 

x' = X -f xAA = (I + KAA) ■ X (D8) 

x' — Xq = X -f (Vx — xdz)h(x)AX — Vxn(0)AAo. (D9) 


Making a first first order Taylor expansion Vxn(x) = 
Vxn(O) -f (x- Vx) Vxn(O), and realising that, at first order in 
displacement, xdzh{x) = xdzh{0) since x is of first order, 
this is 


x^ — xq = X ■ [K -I- (VxVxfi — VxfiVxn — K9zn)AA] (DIO) 

where the penultimate term, which like the last, is non-linear 
in the metric fluctuations, comes from the first order (in x 
and n) difference between A A and AAo- 

Writing the LHS as x' — X p = K' • x' and substituting 
X = (I + KAA)“^ • x' from (D 81 on the RHS and linearising 
in A A, gives 

K' = K -b [(VxVx - Kdz)h - VxnVxfi - K ■ K] AA (Dll) 

or equivalently, with K' = K -f KAA, we have the optical 
tensor transport equation 


K = (VxVx - Kdz)n - V^hV^n - K • K . 


(D12) 


The linear spatial derivative operator in the first term 
has a simple physical interpretation; it gives the second 
derivative of h on the curved wavefront with respect to the 


tangent plane coordinates. The transport equation (D12l 


says that changes in K are driven by any transverse gradi¬ 
ents of the refractive index on the wavefront surface that the 
beam encounters, which makes sense, but there is also the 
non-linear term —K ■ K which ‘drives’ changes in K even 
in the absence of refractive index variations. This also has 
a simple explanation; downstream of a refractive index fluc¬ 
tuation the ray directions are unchanging, but their trans¬ 


verse positions evolve according to (D8l, so the gradient of 


the fixed transverse velocity with respect to the evolving x' 
coordinates must change. 


D1.2 Optical scalar transport equations 


The ‘optical scalar’ transport equations (Sachs 1961) are ob¬ 
tained by decomposing the optical tensor into the expansion 
rate 6 = Tr(K)/2 and the trace-free rate of shear S = {K} 
where the curly braces around a matrix indicates the trace 
free projection: {M} = M — ITr(M)/2 (so S = K — 01). 
Now for any trace-free 2x2 matrix N = {{a, &}, {c, —a}} 
it is easy to see that N • N = — |NjI, from which it follows 
that K ■ K = (0H- S) • (01 + S) = (02 -b E2)I + 20E where 
we have defined = Tr(S • S)/2 = —15]|. 

Taking the trace and trace-free projections of dm^ 
yields the coupled transport equations 

0= ^^-0aA^n-|Vxn|V2-0^-E2 (D13) 

S = ({VlVl} - ^dx)n - {VxfiVxn} - 20S (D14) 


where we are now using V_l^ to denote the transverse Lapla- 
cian Vx on the guiding ray (this is not the same as the dot 
product the operator in (D21 with itself which, containing 
X, is position dependent) and d\ to denote derivative with 
respect to position along the guiding ray. The rate of shear 
tensor E being trace-free has three independent components 
which can be further decomposed to a 2-component shear 
that is sometimes represented as a complex number and a 
vorticity. We shall not use that decomposition and will just 
work with S as a tensor. But separating the expansion rate 
0 is useful, since unlike E it is non-vanishing in the unper¬ 
turbed universe. 
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The form of (D13l & (D14| is a little different to e.g. 


interpretation as the solid angle of the beam at the source or 


equations (6.6) of Blandford & Narayan (1986) which have 
the linear 2nd derivative terms and the terms involving 9^, 
Y? and but are missing the other non-linear deriva¬ 
tive terms. As we discuss shortly, these differences arise in 
part because the spatial derivatives here are with respect to 
conformal background coordinates rather than local proper 
coordinates; using the latter eliminates the derivative along 
the line of sight d\, but we are still left with the terms in¬ 
volving the square of the transverse gradient. It is certainly 
the case that, for lensing by random structures, these terms 
are smaller than both the linear 2nd derivative terms and 
the terms involving products of the cumulative rate of shear 
and expansion, but they still need to be kept here. If we ig¬ 
nore these terms we find that there is a contribution to the 
mean fractional area perturbation on the order ())^(A/I/)^. 
This is smaller than the claims by e.g. CUMD14, which are 
{A.A)/Aq ~ but larger than the correct result 

which is ~ (^P’XjL. 

Starting at some initial point on the central ray, and 
with some choice of orientation of the initial orthogonal coor¬ 
dinate system, then for a given log refractive index field h(r) 
one could integrate these equations, along with the geodesic 
equation to track the motion of the guiding centre, to trans¬ 
port 9 and S along the ray[^ 

If the refractive index has no spatial gradients, equa¬ 


tions (D13l & (D14l admit a solution 0 = 1/A and S = 0. 
This is the appropriate initial condition for a narrow bun¬ 
dle of rays that leave the observer, and is the zeroth order 
solution about which we will develop our perturbative anal¬ 
ysis. Note that in the case of an observer at the centre of 
a spherically symmetric ‘lens’ with n(r) = n(r) this will 
still be a solution. This is required by symmetry, and can be 
confirmed by calculation since for any spherically symmetric 
function /(r = ^+ |xP) it is easily shown that Vj/ eval¬ 
uated at X = 0 is just 2{df ldr)jr so the transverse Laplacian 


of h in (D13l is cancelled by the longitudinal gradient term 
—29d\n = —2\~^d\n. 

The reason that these equations are of interest to us is 


that, according to (D8l, the area of the bundle evolves as 


A' = A\I + KAA| = A{1 -f Tr(K) AA + ...)= A(1 -f 29/X\ -f 
...), where ... indicates terms of higher than 1st order in 
AA. Thus 9 = A/2A = D/D where D = v74, which is why 
9 is called the expansion rate. Note that we are justified 
in calculating the first order change in the area using the 
projected, rather than rotated, coordinates here since the 
difference in the areas is second order. 

The solution of A/2A = 0(A) = A“^ -I- A0(A) is 

A = exp 1^2 J dX' A0(A') j (D15) 

where is a constant of integration (which has an obvious 


^ There is a slight subtlety here in that one needs to keep track 
of the rotation of the perpendicular coordinate system as the 
central ray direction changes. The coordinate system we have 
used here is not tied to any neighbouring rays. Instead, the new 
coordinate axes {x(, x^}, viewed as 3-vectors in r-space, are, after 
propagating a distance AA, obtained from the unprimed ones 
by applying a rotation about the axis that is the cross product 
r X (AAVyh). This will not concern us here, however. 


observer) and where A0 must be obtained by solving (D13l 


& (D141. We will presently do this by means of expansion up 
to second order in the assumed small refractive index fluc¬ 
tuations. But first we make connection with the, arguably 
more elegant, relativistic treatment and discuss the inter¬ 
pretation of the ‘focusing theorem’. 


D2 The focusing theorem 

The rate of change with distance of D/D is 9 — D/D — 
(D/D)^ = D/D — 9^ so, according to (D13l, 


D/D = 


^ - 03^ ] n - |Vxn|V2 - 


(D16) 


This appears to differ from the usual expression (e.g. Schnei¬ 
der, Ehlers & Falco 1992) 


D/D = -R^i3k°‘k^/2 - 


(D17) 


where Rap is the Ricci tensor and k°‘ is the guiding ray 4- 
vector. In particular, the rate of expansion 0 does not appear 
in ( |d 77| . The difference is partly because we are working 
in terms of the metric fluctuations - assumed to take the 
weak-field form - and in part because our D is a distance 
in conformal background coordinate units whereas in ( |D17[ ) 
the distance is in proper distance units. In the weak-field 
approximation Qrr = 1 — 2,(f>, but = (1 — 2</)/(l + 2(f)) 
so at lowest order in the metric fluctuations Qrr = n and 
physical distances are related to background distances by 
dX* = nD^dX, so partial derivatives with respect to physi¬ 
cal coordinates are Vx* = n“^^^Vx and dx* = Jjj 


terms of D* = (D16l becomes 


7^ I ^ I 12 ^^2 


(D18) 


where V*n is the 3D Laplacian operator in phys ical coor- 
dinates V* = + 9 a* a-nd where, as in (D17l, the rate 


of expansion no longer appears. Here dot denotes derivative 
with respect to background distance along the ray. 


Equation (D17l is the basis for the focusing theorem 


(Seitz, Schneider & Ehlers 1994): since both terms on the 
RHS are negative for any sensible equation of state for mat- 
ter, then, rather generally, D/D < 0. The first term in (D171 


describes the local effect of matter within the beam while 
the second term is the integrated effect of tidal fields from 
matter outside the beam, or Weyl curvature, along the path 
of the beam. The focusing equation says that the latter can 
only act to enhance the local focusing by positive density 
matter and that, as compared to rays in Minkowski space- 
time where D = 0 beams are always focused (at least up 
until caustic formation). This result seems also to be in ac¬ 
cord with calculations based on the lens equation (Schneider 
1984; Ehlers & Schneider 1986; Seitz & Schneider 1992) that 
any lens will give rise to at least one image that is magnified. 
See Schneider, Ehlers & Falco (1992) and Narlikar (2010) for 
further discussion. 

In the cosmological context, the width of an unper¬ 
turbed beam in conformal (or co-moving) coordinate is 
D = VlIA, so I) = 0. The local tidal focusing, in this 
context, is caused by the density fluctuations around the 
mean value, which averages to zero. More interesting is the 
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effect of the shear, which is cumulative and always nega¬ 
tive. To get a sense of the size of the effect, we note that 


in the perturbative regime the linearised version of (D14l 
is Si = {VxVxjn where h ~ —2(p. So, in a model of 
random positive or negative perturbations to the Newto¬ 
nian potential with RMS value (p and characteristic scale 
L, Si will perform a random walk and will have typical 
mean squared value (Sf) ~ N(j>^/K. But N ~ A/L, so 
(Si) ~ ~ {k^)/X^. There are other non-linear terms 

in ( |dT^ , but it is not difficult to show that their expecta¬ 
tion values are all much smaller than (Sf). If the change 
to the distance is small we can approximate D/D by D/Dq 
and it then follows that (D16l after integration implies a 
perturbation to the mean of D = '/A, or equivalently the 
mean angular diameter distance, that is on the order of the 
mean squared convergence {AD)/Do — (k^) with a numer¬ 
ical coefficient that is negative. 

Thus the optical scalar formalism shows, rather nicely, 
that structure causes a negative bias in the mean (direc¬ 
tion averaged) apparent distance. But that should come as 
no particu lar su rprise. As discussed in the Introduction and 
shown in (2.4.1 we expect {AD)/Do = —{iP)/2 when aver¬ 


aged over directions simply because D is the square root of 
the fluctuating area per unit solid angle. The obvious ques¬ 
tion is whether, as found by CUMD14, the optical scalar 
equations actually predict a decrease in the area of a sur¬ 
face of constant redshift. To answer this requires a more 
quantitative analysis. 

In what follows we will show, using the optical scalar 
formalism, and in the perturbative regime, that the di¬ 
rection averaged perturbation to the area is much smaller 
than the distance perturbation {AD)/Dq ~ (pX^/K. In 
fact {AA)/Ao is suppressed compared to {AD) /Do by two 
powers of L/X so {AA)/Ao ~ pP'X/L. This means that the 
next order terms, which include post-Born corrections, ac¬ 


tually cancel (as was also seen in (2.31, but it gives a re¬ 
sult in accord with simple-minded consideration of reduc¬ 
tion in distance reached and wrinkling of surfaces, as pre¬ 
sented in appendix]^ We then show how, consistent with 
this, in perturbation theory the leading order distance bias 
is {AD)/Do = a{K^) with numerical coefficient a = —1/2 as 
one would expect if the distance bias comes from statistical 
fluctuations. But this seems to us to be a somewhat back¬ 
ward step; the distance bias may well not be well described 
by linear theory when small scale structure is taken into ac¬ 
count. The un-focusing theorem {AA)/Ao = 0-1- 0{</?'X/L) 
is, we will argue, more powerful. 


D3 Perturbation analysis 

As already mentioned, in the absence of refractive index 


fluctuations Vh = 0 the solution of (D13l & (D14i is Q — 


1/A and S = 0. If we let d = do A-Q' with do = 1/A and drop 
the prime the optical scalar equations become 

1 dX^d (\/P dx \ - |Vxn|" . ,2 ^2 

(D19) 


1 dA^E 
A2 dX 


= ({Vx Vx} - ^dx)h - {VxhVxh} - 26»S . 


(D20) 


We solve these in two steps. Dropping all of the second 
order terms above yields the first order solutions: 






—+^r-x 


A 

= ^ j dX' A'"{VxVx}n 


(D21) 

(D22) 


where the integrals are taken along the undeflected path, 
i.e. fi = h(r = zA'), and where we have integrated by parts 
to eliminate the longitudinal derivative in d\, and where the 
spatial derivatives are with respect to the Cartesian coordi¬ 
nates. From now on we will consider h to be the perturbation 
to the log refractive index. 

We then insert these in the second order terms on the 


RHS of (D19l to obtain the second order solution 


01+2 = ^ 


dX'X’' 


2 


dx \ 

A 


-\V^h\^/2-dod,n -dl-Y/i 


(D23) 


where 0 i +2 includes the first order solution and where, to 
obtain 2nd order precision, we need to evaluate the first 
occurrence of h along the 1st order perturbed path 


A 

ri = zA -I- J dX' (A — A')Vxn(zA') 


(D24) 


and we need to pay attention to first order perturbations to 
the derivatives Vx and dx in Vx^ — 2A“^3 a- 

The beam a rea, a fter propagating a physical path length 
A, is then from (D151 just A = Aq + AA — QX^ + AA with 


AA 

Ao 


= 2 


j dX' 01+2 -f 2 M dA' 01 I -f ... (D25) 


and our ultimate goal is to obtain the expectation value of 
this in terms of the spatial autocorrelation function of the 
log refractive index fluctuations ^(r — r') = (n(r)n(r')). 

At linear order the area perturbation is AA/Ao = 
—2k -|- ... so from (D25l we can identify the linear conver¬ 


gence K with minus the integral of the first order expansion 
K = — J dX' di{X'). So the ensemble average of the second 
term here is, at leading order, just 2{P). CUMD14 obtained 


an equation similar to (D25l - though they have a different 
numerical coefficient for the second term - and claimed that 
the first term vanishes at all orders in the perturbation. We 
will now show that this is not the case and that not only 
does the leading order contribution to AA/Ao from the sec¬ 


ond term in (D25l get cancelled by the first, but the next 
order terms - in an expansion of powers of the assumed small 
ratio L/X - cancel also. 

The calculation is conceptually straightforward, but te¬ 
dious in the number of terms it generates (many of which 
cancel in the end). To reduce the effort, it helps to define 
Aa = AA/Ao i-e. the fractional perturbation to the area at 
any point along the beam. The rate of change of this with 
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path length is, from (D25l, and to second order 


Aa = 201+2 + 401 


A 

j d\' 01 


(D26) 


and differentiating once more, and using (D19l, yields 
1 dA^AA /Vi^ dx' 


2A2 d\ 


- |Vxn|' 
A 7 2 


— 0i92n 


0 


A 


(D27) 


where we note that the large squared first order expansion 
and shear terms now appear with opposite sign so their lead¬ 
ing order effects cancel. The next step is to work out the 
expectation values of each of the terms here. In doing so all 
but the first term can be calculated using the Born approx¬ 


imation, using (D211 to express 0i in terms of the refractive 


index fluctuations. The first term is then calculated in the 
post-Born approximation and allowing for the first order 
perturbations to e.g. Vl. This gives d{X^{A a}) / d\ in terms 
of ^ and shows how the large terms cancel. We then inte¬ 
grate to obtain (Aa) and show that this agrees with what 
we found in the simpler calculation in the main text. The 
next few sub-sections give the details, term by term; §D4| 
takes stock. 


D3.1 The last term 


From (D211 the two factors in the last term above are 
2 dX^di 


A2 dX 

A 


= (V^ - 2X-^d.)f 


(D28) 


dA' 0i = ^ / dX' [A'(A - A')V^ - 2]n(A') (D29) 


0 0 
and the expectation value of their product is 

A 


2 dx^e 


A2 dX 


- J dx' e[ 


2 


A 

j dX' X' 


1 / = —2Vx^o 

2[e]°A 


(D30) 


VxCa-a' - 


A2 


where we have used {{d\h{X))h{X')) = —dy^iX — A') and 
have also made use of the identities VxC(y) = 2 ^Vj/, where 
= d^dy, and V^iy) = /yY/y = 4(V^C)72/- 
The first term here is relatively large, scaling as \/LY. 
This is one of a number of similar terms that cancel collec¬ 
tively. 

D3.2 Expectation value o/Ei 

We now turn to the three second order terms on the first 


line of (D27l. Starting with Ef, this is 


(E?) = Tr((Si • Si))/2 


dX" A"" 


(D31) 


X Tr({{VxVx}n- {VxVxjn)) 
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Writing n(r) = (27r) ® J d^k fik exp(ik • r) we have 


Si = - 


/ 


d'^fc 

(271): 


rUk 


(fc? - kl)/2 fcife 

fcafci - kl)/2 


(D32) 

from which we find, on invoking statistical translational in¬ 
variance: (nkh-k') = (27r)®d(k — k')Pfi(k), 

A A 

(E?) = ^ y dA' A'" y dX" A"VieA'-A" (D33) 
0 0 

where f(r) = {2'k)~^ J dYk Pft(k)exp(*k • r). The integral 
here scales as L~^ so this is a very large term, but it is 
cancelled by an identical leading order term in (0i). 


D3.3 Expectation value of df — Ei 


Using (D211 we find that the average of the term involving 
two Laplacians in 0i is identical to Ef and we have, for the 
residual net effect. 




nX — 


jiX'n' I 


1 \ ft ■\//2 ~ 

dX A V^n 


1 


nA — 


A 

J dX' n 


J 0 

2 


(D34) 


which has expectation value 

A A 


{el - E?) = y y dx' j dx" x"^vUx'-x" 


1 

V 


y dX! A'"V^6_v + 


0 


A2 

A A 


(D35) 


0 0 0 

The third term scales as L® and the last two scale as L so all 
three are ignorable. The first two terms both scale as L~^. 


D3.4 Expectation value of —6\dzn 
Next we consider 


— 9idzh = —dzfi 




—+4"-a 


(D36) 


which has expectation value 


A 

- ^ i A'V^Ca-a' + ^ . (D37) 

0 

These scale as L~^, L~^ and respectively, so the last is 
ignorable. 


D3.5 Expectation value of (Vx^/2 — A ^d\)h — j Vxnp/2 


The first term on the RHS of ( |D27[ ) is of first order in the 
refractive index fluctuation, but acquires a non-zero average 
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at second order because n must be evaluated on the 1 st order 


perturbed beam path (D24l and the spatial derivatives are 
also perturbed. 


The spatial derivative operators are, according to (D2|, 


perturbed at first order relative to the derivatives with re¬ 
spect to the Cartesian coordinates: 

Vl^ = Vx - 2x1 ■ Vx9;, (D38) 

9a = + xi • Vx (D39) 

where the first order perturbation to the beam direction is 

A 

ii = d\' Vxh'. (D40) 

0 

Using these and making a Taylor expansion to obtain 
h along the 1 st order perturbed path and discarding terms 
that are higher than second order in h gives 


2 


9a 

A 


+ (^-^ 

'2 A 


V| 

2 

Vxh • 


j d\' (A - A')Vx 

0 

A 

- (9, + 1 /A) Vxh ■ J 


(D41) 


Vxh' 


Including the expectation value of — |Vxhp/2 which is just 
VxCo /2 we obtain the expectation value for these final terms: 

,D42) 


D4 


ih 

A 


The final result for the mean area 
perturbation 


Putting all the above pieces together - equations |D30| [b35[ 
|D37| and |D42| - the first thing we note is that the ‘large’ 
terms involving Vx^o and which scale as L~^ all cancel. 
So not only is there no very large Born-level contribution 
from and 6^ and scaling as L~^, the next order L~^ 
terms, including hrst post-Born approximation corrections, 
also cancel out in the end. At leading order - i.e. at level 
(/^A/L - t he su rviving net effect is equal to the first term in 
equation (D35l for {6f — Ef): 


1 d\^{AA} 
2 A 2 d\ 


y j d\' j dX” (D43) 


The Laplacian is a narrow function of width ~ L so for 
all A' except for within ~ L of the observer or the source 
sphere the second integral will have converged and will be 
well approximated by X'^J. We have assumed here that the 
metric fluctuations are not evolving - this was quite hard 
enough - so the first integral can be performed too to give 


1 dX^{AA) 
2A2 dX 

where we have defined 

0 


2 J 


j = - 


dy Vlav) 


(D44) 


(D45) 


which is the same as the definition 0 since here ^ = 4^^ 
and VxC(i/) = 2^'{y)/y- 

Finally, integrating this gives 


(Aa) = -2JA/3. 


(D46) 


This is the ensemble average of the fractional perturbation 
to the area at the end of a beam of path length (physical 
for glass, conformal background in cosmology) A. It is not 
the same as the perturbation to the area of the surface of 


constant path 2{Ar)/r in (A371 as elements of area on that 
surface are not perpendicular to the beam as is the case here. 
So ( |D46| | should be compared to sum of (A37l and (A41l 
2{Ar)/r — (Ax^)/2. They agree. This does not provide the 
full expression for the perturbation to a surface of constant 
redshift, or the cosmic photosphere. For that it is necessary 
to add the contributions arising from the fluctuating path 
length coming from time delays. This is done in the main 
part of the paper. 


D5 The focusing equation in the perturbative 
regime 

We can obtain the mean distance perturbation from the 
focusing equation in the perturbative regime in much the 
same way. The solution of D/D = A”'^ + d is D = 
y/nXexp{J dX6), and expanding this up to second order, 
and dehning Ad = {D — Do)/Do, gives 


Ao = y dX' 6 ) 1+2 +ijdX'ei\ +... 


(D47) 


rather like (D25l. Differentiating this gives 
1 dX^Ao 9 a 


A2 dX 


- IVxhp . 

2 


^2 , 1 rfA'^i 


A 

y dX' e[ 


(D48) 


which is very similar to (D271 but there is no longer cancel¬ 


lation of the large leading order contribution from Ei by 9\. 
The results from the previous section show that the other 
terms are relatively negligible (i.e. of higher order in L/X) 


1 dX Ad _ „2 

A 2 dX ~ ~ ^ 


(D49) 


at leading order. 

On the other hand, the linear convergence is k(A) = 
J dX'0\{X') so K? = 26\ f dX'9i(X') from which we find 


1 dX^K^ 1 dX^di 


A 

y dx' 01 . 


(D50) 


But again the results from the previous section show that, 
in the ensemble average sense, the second term is negligible 
compared to the first and that, in the ensemble average, 
(^i) = (S?) to leading order. Thus we have 


1 dX^(AD} 


= -(E?) = -(0?) = - 


1 dX^(K2) 


. (D51) 


A2 dX \ \ W 2A2 dX 

This can be integrated twice, with appropriate boundary 
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conditions at the observer to give, for the average of A_d = taking the ensemble average of the extra derivative term 
AD/Do, gives non-zero contribution to (k) given at leading order by 


{AD)/Do = -{k)/2 (D52) 

consistent with conservation of area. At the end of this jour¬ 
ney, we therefore have a conclusion consistent with the one 
obtained previously by more elementary means: cosmologi¬ 
cal inhomogeneities have no tendency to focus beams of light 
in terms of changing their area. 


APPENDIX E: SOURCE AVERAGED 
CONVERGENCE 


The mean magnification of sources is almost precisely unity, 
but we have shown that the mean inverse magnification, 
averaged over sources, is non-zero: 

(m )a = 1 + {{Afi) ) -|- ... = 1 -|- 4{k. ) -|- ... (El) 


This effect can be understood qualitatively as being a con¬ 
sequence of extremal paths to sources tending to avoid over¬ 
densities and therefore sampling paths for which the conver¬ 
gence, on average, is negative. We have invoked this in e.g. 
il Here we expand on this and compute the bias in the the 
mean convergence, or in the column density of matter, in 
the perturbative regime. This may be of some relevance to 
cosmic gas abundance measurements from absorption line 
studies. 

It is well known that images of sources behind clusters 
of galaxies appear to be ‘repelled’ by the cluster, being bi¬ 
ased against high-density regions. This makes sense: rather 
than going through the centre of a cluster, light rays can 
minimise their total travel time by taking a longer path to 
one side of the cluster in order to reduce gravitational time 
delay near the centre. Another viewpoint on this bias is to 
consider a thin screen populated with weak lensing regions 
scattered around the sky; some with enhanced surface den¬ 
sity and an equal number of negative lenses, all the lenses 
being of the same area. It is easy to see that the light paths 
that pass through the negative lenses will diverge and will 
map to a larger area of the source sphere than those which 
pass through the positive lenses. Thus the observer will see 
more sources through the negative lenses than through the 
positive lenses (and if the observer can resolve the sources 
would see the former to be shrunken relative to the latter). 
Averaged over the sources then the mean surface density 
fluctuation and convergence will be biased negative. We now 
show how this works out with 3-dimensional metric fluctu¬ 
ations rather than a thin screen. 

The convergence, for a bundle of rays that arrives at 
the observer along the 2 -axis from distance Ao, is 


''0 

= dA A(Ao - A)Vl"</>(A). 


(E2) 


At linear order, the integral can be taken along the 2 -axis 
and we can ignore the difference between and Vx. The 
ensemble average of this vanishes. 

Going beyond first order we must allow for first order 
change to spatial derivatives: — x • Vx9z and also 


for the displacement of the path. Using (A18l for x and 


-'o 

5{k,) = ^JdX A(Ao - A)(| Vx<(>|") (E3) 


This scales (with the lens properties) as 4>^ /, which is 
large compared to the effect on the area, but sub-dominant 
here, where the leading order effect scales, like {k/) as L~^. 
So we can ignore the distinction between and Vx in 


The first order displacement, also for a ray that arrives 
at the observer along the 2 -axis, is, at the source plane. 


>■0 

x(Ao) = -2jdX (Ao - A)Vx<(.(A). 


(E4) 


If there were a source sitting on the 2 -axis then the ray that 
we need to fire in order to reach that source would have to 
arrive at the observer with direction © = —x(Ao)/Ao. The 
displacement from the 2 -axis for that ray, at some distance 
A along the ray, again to 1st order, is 


A 

x(A) = -^J dX' (A - A')Vx<))(A') 


-^0 

+ 2 ^ f dX' (Ao - A')Vx<)>(A') 

Ao J 


(E5) 


which vanishes at both ends of the ray. 


To compute the ensemble average of k in (E2 I, correct to 


2 nd order, for the ray that reaches the source we simply need 


to replace Vx'^<(> in (E21 by Vx<(> + x(A) ■ VxVx<(> with x(A) 


as in (E5|. As mentioned, the average of Vx^<(>, understood 


to be along the unperturbed path vanishes. This gives a 
double integral involving {Vx(?i>(A') • VxVx<;^'(A)). But under 
the assumption that is a statistically homogeneous random 
process this is just minus (Vj<?!>(A') Vx<)>(A)) (since each time 
we move an index we pick up a factor = —1) which is 
clearly a symmetric, positive function of A—A' and {Vx())(A')- 

VxVj0(A)) =-V^^^A - A'). 

The result, in gory detail except for suppressing the 
argument of Vx^</,(A — A'), is 


(«)a = -^ j dXX\Xo-X) j dX' (Ao - X')Wi£.^ 
^0 0 
Aq a 

+ ^JdX A(Ao - A) y dX' (A - A')V:^C0- 


(E6) 


These two double integrals are very similar looking, but 
are quite different. When we apply the condition that the 
correlation length is much less than A ~ c/H we can effec¬ 
tively replace the factor Ao — A' by Ao — A in the first expres¬ 
sion and take it outside the second integral and replace that 
with an unrestricted integral of f dy Vx^,/,(y) (which does 
not vanish as the integrand is even). In the second line how¬ 
ever, when we change variables in the inner integral from A' 
to y = X' — X' we only have a one-sided integral. That is not 
particularly significant, but instead of Ao —A' we have A —A^ 
which is very small whenever VxC<^(A — A') is not negligible. 
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The result is that the second line is negligible compared to 
the first and, on dropping it, we have 


Aq oo 

= -^J ^'(^0 - A)" I dy VtUiy)- 


(E7) 


However, if we were calculating the sky-direction weighted 
mean convergence then we would not have the second line 


in (E51 and we would only have the much smaller term we 


are discarding here. 

The integral appearing here is superficially similar to 
the definition of J but involves VxC(/> rather than VxC<#>- By 
the same reasoning that led us to (A341 one can express this 


integral in terms of the power spectrum: 


OO 

j dy\/U^iy)^7T j dink Alik). (E 8 ) 


The extra two powers of k in the integrand as compared 


to (A34l mean that this integral is dominated by small- 


scale structure. Indeed, if the mass auto-correlation function 
is similar to that of galaxies: ^ oc r *' with 7 ~ 1.8 then 
A| oc and the integral here is ~ J dlnA:fe^“^ which 

diverges for large k provided 7 > 1 . 

The same line of ar gument gives the expectation for 
with K given by (E2 1 which is almost identical but with 
- 1-1 in place of the factor —2 so 


{k)a = —2(k^) 


(E9) 


so the mean column density along paths to sources is lower 
than on average. Despite this, the flux density of sources is 
not biased, but the inverse magnification is biased positive. 
This result ignores selection effects, however. If sources are 
selected according to luminosity there will be a magnihca- 
tion bias and other effects such as extinction by dust may be 
important. These effects have been discussed in the context 
of estimation of the neutral HI density from damped Ly-a 
systems by Bartelmann & Loeb (1996) using single lenses 
modelled as isothermal spheres. 
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